How to parse a string and create rows in SQL (postgres) - sql

I have a single database field that contains a start date, end date, and exclusions in the form
available DD/MONTH/YYYY [to DD/MONTH/YYYY]?[, exclude WORD [, WORD]*]?
Meaning it always starts with "available DD/MONTH/YYYY", optionally has a single "to DD/MONTH/YYYY", and optionally has an exclude clause that is a comma separated list of strings. Think regular expression meanings for + , *, and ?
I have been tasked with extracting the data out so we will now have a "startdate" column, "enddate" column, and a new table that will contain the exclusions. It will need to fill the startdate and enddate columns with the values parsed from the availability string. It will also need to create multiple records in the new exclusion table, one for each of the comma separate values after the 'exclude' key word in the availability string.
Is this a migration I can do in SQL only (postgres 8.4)?
This is against postgres 8.4.
Update: With the help of a co-worker we now have a sql script that has as it's results sql to perform the insert statements based on the parsing of the exclusions. It uses a bunch of case statements and string manipulation within the sql to generate the results. I then send the output to a file and execute that file to perform the inserts. I am doing the same for the start and end date columns.
It's not 100% sql, but a simple .bat or .sh file that runs the first .sql file, then the generated one is all that is needed to get it to go.
Thanks for the input.

You can probably do that with a combination of the regexp functions ( and the to_date() or to_timestamp() functions.
But it may be easier to just mangle the text in a function in say pl/perl. That'll get you access to the full manipulation functions in perl, while keeping the work inside the database as seems to be your specification.

why single SQL?
Write simple script in Ruby/Python/Basic to read data from the source, parse it, and put into destination database.
Or it is so big?

Related

SQL Server: Replace all instances of a string across all columns in a select statement

I have an issue where a BCP call to one of our stored procedures returns the NUL (\0) value character whenever it finds a column with an empty string as its content. This NUL cannot be processed by the caller, and I need to find a solution on either SQL script level or data insertion level to fix this.
My idea is to replace all empty strings with a proper NULL, as that is returned by BCP as an empty string. For a small table this could be done by CASE statements or similar, but my table has around 50 varchar columns with tens of thousands of rows being pulled at once, so I'm worried this would result in big performace issues.
Is there a way to manipulate an output of a select statement in a way where all occurances of a certain character/string, across all columns and rows, are replaced with another character/string?
Your best bet is going to be either to update the data, update your process to properly handle empty strings, or use something like NULLIF(col, '') on the selected columns.

Convert a comma separated string into differents rows in a single query in InfluxDB

I have a table in InfluxDB which, for optimization, contains a field that can either come only as it would be for example "FR204" or it can come in a string concatenated by commas as would be the case with "FR204, FR301".
The fact is that I would like to be able to do exactly what is reflected in this case: https://rstopup.com/convertir-una-cadena-separada-por-comas-en-cada-una-de-las-filas.html
Is it possible to do this in InfluxDB? Thanks.

Convert all selected columns to_char

I am using oracle SQL queries in an external Program (Pentaho Data Integration (PDI)).
I need to convert all columns to string values before I can proceed with using them.
What i am looking for is something that automatically applies the
select to_date(col1), to_date(col2),..., to_date(colN) from example_table;
to all columns, so that you might at best wrap this statement:
select * from example_table;
and all columns are automatically converted.
For explanation: I need this because PDI doesn't seem to work fine when getting uncasted DATE columns. Since I have dynamic queries, I do not know if a DATE column exists and simply want to convert all columns to strings.
EDIT
Since the queries vary and since I have a long list of them as an input, I am looking for a more generic method than just manually writing to_char() infront of every column.
If you are looking for a solution in PDI, you need to create a job (.kjb) where in you take 2 transformations. First .ktr will rebuild the query and the Second .ktr will execute the new query.
1. First Transformation: Rebuild the query
Read the columns in the Source Table Step (use Table Input step in your case). Write the query select * from example_table; and limit the rows to either 0 or 1. The idea here is not to fetch all the rows but to recreate the query.
Use Meta Structure Step to get the meta-structure of the table. It will fetch you the list of columns coming in from the prev. step.
In the Modified JavaScript step, use a small snip of code to check if the data type of column is Date and then concat to_Char(column) to the rows.
Finally Group and Set the variables into a variable.
This is the point where the fields are recreated for you automatically. Now the next step is to execute this field with the new query.
2. Second Transformation: Using this set variable in the next step to get the result. ${NWFIELDNAME} is the variable you have set with the modified column in the above transformation.
Hope this helps :)
I have placed the code for the first ktr in gist here.
select TO_CHAR(*) from example_table;
You should not use * in your production code, it is a bad coding practice. You should explicitly mention the column names which you want to fetch.
Also, TO_CHAR(*) makes no sense. How would you convert date to string? You must use proper format model.
In conclusion, it would take a minute or two at max to list down the column names using a good text editor.
I can so not immagine an application that does not know about the actual data types but if you really want to automa(gi)cally convert all columns to strings, I see two possibilities in Oracle:
If your application language allows you to specify the binding type, you simply bind all your output variables to a string variable. The Oracle driver than takes care to convert all types to strings and this is for example possible with jdbc (Java).
If (as it seems) your application language does not allow the first solution, the best way I could think of, is to define a view for each select you want to use with the appropriate TO_CHAR convertions already and then select from the view. Those views could eventually also be generated automatically from the table repository (user_table) with some PL/SQL.
Please also note, that TO_CHAR will convert your columns acccording to the NLS settings of your session and this might lead to unwanted results, so you might also want to always specify how to convert:
SELECT TO_CHAR(SYSDATE, 'YYYY-MM-DD') FROM DUAL;
using these 2 tables, you could write a procedure with looks at the columns on each table and then performs the appropriate TO_CHAR depending on the current datatype
select * from user_tab_columns
select * from user_tables
psuedo code
begin
loop on table -- user_tables
loop on column -- user_tab_columns
if current data_type = DATE then
lnewColumn = TO_CHAR(oldColumn...(
elsif current data_type = NUMBER then
...

In Oracle, how do you select multiple values from a related table and store them in a single column?

I'm selecting columns from one table and would like to select all values of a column from a related table when the two tables have a matching value, separate them by commas, and display them in a single column with my results from table one.
I'm fairly new to this and apologize ahead of time if I'm not wording it correctly.
It sounds like what you're trying to do is to take multiple rows and aggregate them into a single row by concatenating string values from one or more columns. Yes?
If that's the case, I can tell you that it's a more difficult problem than it seems if you want to do it using portable SQL - especially if you don't know ahead of time how many items you may get.
The Oracle-specific solution often used in such cases is to implement a custom aggregate function - STRAGG(). Here's a link to an article that describes exactly how to do so and has examples of it's usage.
If you're on Oracle 9i or later and are willing to live with using undocumented functions (that could change in the future), you can also look at the WM_CONCAT() function - which does much the same thing.
You want a row aggregation or concatenation function, choices are:
If you are using Oracle 11gR2, there is a built-in function to aggregate strings with a delimiter called LISTAGG(column, delimiter).
If you are using any earlier release of Oracle database, you can use WM_CONCAT(column) function, however you have no choice of delimiter and will have to use something like TRANSLATE(string, string_to_replace, replacement_string) function to change the delimiter afterwards if your data does not contain commas.
As mentioned by LBushkin, you can create a custom function in your schema to perform row aggregation for you. Here is PL/SQL code example for one: http://www.oracle-base.com/articles/misc/StringAggregationTechniques.php#user_defined_aggregate_function

SQL Server 2005 Import from Excel

I'd like to know what my best option would be to import data from an excel file on a weekly or monthly basis. At first, I thought I would use SSIS, but after much struggle with seemingly simple tasks, I'm starting to rethink my plan. Would it be better/easier to just write the SQL by hand or use the services of an SSIS package? The basic process will be as follows:
A separate process will download an .xls file to a local fileshare.
The xls file will have a filename like: 'myfilename MON YY'.
I will need to read the month and year from the the filename, reformat it to a sql date and then query a DimDate table to find the corresponding date key.
For each row (after the first 2 header rows), insert the data with the date key, unless the row is a total row, then ignore.
Here are some of the issues I've been encountering with SSIS:
I can parse the date string from a flat file datasource, but can't seem to do it with an excel data source. Also, once parsed, i cannot seem to convert the string to a date in order to perform the lookup for the date key. For example, I want to do something like this:
select DateKey from DimDate
where ActualDate = convert(datetime, '01-' + 'JAN-10', 120)
but i don't think it is possible to use the 'convert' or 'datetime' keywords in an expression builder. I have been also unable to find where I can edit the SQL to ignore the first 2 rows of data.
I'm very skeptical of using SSIS because it seems like a Kludgy way of doing something that can probably be accomplished more efficiently writing the SQL yourself, but I may be forced to use SSIS. Thoughts?
SSIS is definitely the direction to go.
To hit on your problems: (DT_DBTIMESTAMP) is the conversion you want. The syntax is a bit different. For instance to convert your example date I would use:
(DT_DBTIMESTAMP)"01/01/2010"
If you use that expression in a derived column to replace your string date (or create a new column), you could then do a lookup against datetime columns in a DB.
If you need to exclude the first two rows, you will either need to write an SQL statement to query the file (as opposed to an excel file reader source) or use a conditional split to throw them away based on any condition that can be repeated with every import.
Flat files easier to work with, and do allow you to throw away x number of initial rows.