SQL to Split Rows into Multiple Column - sql

I have below table TEST with singe column DATA
00001900-01-01Aseopenigaccount-RF RF-ADIT
00341900-02-01Aseopenigaccount-RASS RASS-ADIT
00761900-03-01Adminopenigaccount-RASS OPEN-System
I required above column DATA split into below columns 4 columns
Code Date Description ShortDesc
0000 1900-01-01 Aseopenigaccount-RF RF-ADIT
0034 1900-02-01 Aseopenigaccount-RASS RF-ADIT
0076 1900-03-01 Adminopenigaccount-RASS RF-ADIT

#at9063, welcome to the community. As the comments indicate, you should provide a sample of your solution in your future questions. It would, also, be really helpful to provide any logical assumptions behind your dataset.
The solution is based on the data that you provided as an example. The first two columns can be extracted by taking the first 4 characters and the following 10. The Description column would start on your 15th character and would go up until the first space. ShortDescr would start from the first space.
SELECT LEFT(my_data,4) AS My_Code,
SUBSTRING(TRIM(my_data),5,10) AS my_date,
SUBSTRING(TRIM(my_data),15,CHARINDEX(' ',my_data)-15) AS My_Description,
SUBSTRING(TRIM(my_data),CHARINDEX(' ',my_data),LEN(TRIM(my_data))+1-CHARINDEX(' ',my_data)) AS my_ShortDesc
FROM test

Related

Is there a formal definition of Gaps and Islands problems? If so, does this problem satisfy it?

It seems that the term "gaps and islands" is overused in my workplace. I recently had essentially the following problem given to me under that banner.
Take a set of data with many rows, each containing lots of data, but in particular, always including a start and stop time column and including many other columns where if one is not NULL then the others are. For example:
Start Time
Stop Time
Drunkenness
Programming Ability
01
60
0100
NULL
10
20
NULL
0450
40
50
NULL
0250
(you may also use the obvious unpivoted equivalent, but don't worry about that)
and convert that data in to a form where all of the data is collapsed in such a way that you can find out what's true at any given time by only needing to look at the single row that corresponds to that time period. So, for the previous example, you want this:
Start Time
Stop Time
Drunkenness
Programming Ability
01
09
0100
NULL
10
20
0100
0450
21
39
0100
NULL
40
50
0100
0250
51
60
0100
NULL
To see that this is what you really want, look at the times in the original rows. Until time 10, only "Dunkenness=0100" is given, so our first row in the result must span from 01 to 09 and contain only Drunkenness info. The next row in the original table spans from 10 to 20, so we must have a row for that time period in the result and it must contain any information that is true at that time (i.e. the "Drunkenness=0100" that is always true and the "Programming Ability = 0450" that is true only between times 10 and 20). As "Programming Ability" is left undefined from time 21 to 39, we must have yet another row where that is NULL. The other two rows are then generated by the same process as the previous rows, so we get the table above.
Is this really a "gaps and islands" problem? Or does the literature give it a different name? I agree that there are gaps in the first dataset and that the results in the final dataset are split in to islands, but that doesn't seem to be what the literature is referring to when it talk of "gaps and islands" problems. The literature seems to care about finding gaps or finding islands, rather than turning gaps in to islands and merging the data like this.
The SQL tag is used because this is a relational database. I am not asking for solutions and I doubt that including an SQL solution in your answer would be enlightening, although they would be welcome. I have therefore not included any SQL code in this question.
I do not believe this question to be opinion-based. I have seen enough coverage of gaps and islands problems to believe that there must be a formal definition of them somewhere. Answers are highly encouraged to provide a formal definition for these problems and a source for it. If this in not a gaps and islands problem, but is actually something else, then please give a name and sourced definition for that.
The condition if one is not NULL then the others are means that your rows are just a different representation of key, value pair. In other words, it un-pivoted variant would look like the following
Key
Value
Start
End
Drunkenness
100
01
60
Programming Ability
450
10
20
Programming Ability
250
40
50
Assume that it passes the data integrity checks, that is, there are no overlapping intervals with different value for the same key. Then it looks like a type-2 slowly changing dimension and indeed we can interpret absence of value for Programming Ability between 20 and 40 (exclusive) as NULL.
However, one can also interpret that data as two separate tables, Drunkenness and Programming Ability merged (via a full join) by start and end date of the intervals.
SELECT coalesce(a.start,b.start) as start, coalesce(a.end,b.end) as end,
a.Value, b.Value
from a full join b on a.start=b.start and a.end = b.end
So, for example, b is missing data for (10,60) and you get NULL for Programming Ability in the first row there. You can get your second table if you properly join these two table accounting for time interval overlaps.
SELECT greatest(a.start,b.start) as start, least(a.end,b.end) as end,
a.Value, b.Value
from a full join b on a.start <= b.end and b.start <= a.end
Either way, it is not quite Gaps and Islands problem. In that problem, data has some overlapping intervals possibly with gaps and one has to determine non-overlapping intervals of continuity separated by gaps of discontinuity.

Extracting data from SAS data set based on values with different length

I am looking to automate a process which has a sales dataset and a specific column named SALES CODE which is of 5 letters.
Based on the input given by the user I would like to filter the data but the problem is the user can give multiple sales codes and sometimes the length of codes could be 5,4,3,2 or 1 based on the condition. How will I filter out the required rows based on the above condition?
SALESCODE area value units rep
A10AA KR 100 10 Jay
B10AQ TN 120 12 Jrn
C10AH KR 200 10 Jay
T11TA TR 180 10 Jay
Say if I give the input as A10AA, B10A, T11 I should be able to
Get the sales data with codes A10AA, B10AQ, T11TA. kindly help.
Use the IN operator. Since you want to match values that start with the specified value use the : modifier. Since your values are character values make sure to include quotes.
proc print data=sales_data ;
where salescode in: ("A10AA" "B10A" "T11");
run;
If you want you can use commas between the values in the list, but I find it easier to type spaces instead.

Breaking a column result into further pieces and putting it in another query in postgresql?

One of my result columns brings like 123/234 or 1234/45/567. From another table we can have value of 123 as abc, 234 as cde etc. So how to get the value abc/cde? Please assist.
You're probably going to want a function that converts your id/id field into name/name value.
http://sqlfiddle.com/#!17/2c336/3/0

Retrieving statistical information when 2 rows are involved

I need to get some information from a data set (csv) which I have boiled down to the following simple table,
Date_Time Id passed
2013-06-23 20:13:10 112 A
2013-06-23 20:58:11 112 B
2013-06-23 21:01:10 118 A
2013-06-23 21:03:31 118 A
2013-06-23 21:05:49 118 A
2013-06-23 23:05:08 118 B
2013-06-24 08:10:03 118 B
The first two records show the simple case, after a check-in (A) we see 0:45:01 later
a check-out (B).
But one can also have more check-ins in row (records 3,4,5) and the check-out following
later. Normally, there would be for every check-in a corresponding check-out.
Unfortunately, the data is not perfect and there are sometimes records missing. (In the
example there are only two check-outs for three check-ins)
I would like to get some statistical values of the times between check-in and check-out,
perhaps on month basis or by weekday and so on. But I also do have to find a way to
discard records if I have no check-out within X-hours or if I find a check-out without
a check-in.
I have been trying with pandas and it looked so prommissing but as a new-be
I got stuck on all the huge possiblities that this magical package offers.
I hope some one can help me out and maybe can explane me a little bit where
to look fore.
Many thanks in advance,
avm
Your table is not structured in such a way that you can do this with one query. If you had a check_in_id column which would be and added column then you could do it with one query. the idea being that there would be at most two rows with the same check_in_id and they would always have the same id.
So instead write a stored procedure to create a tmp table. The tmp table would contain the added column. Your stored procedure would need iterate over the rows of the table and find the most recent check out given the id, that is not already in the tmp table.

Match exact values on oracle join

Here it goes.
I have two tables say, Application and Report.
There tables have a common column(appId, externalAppId respectively) which can be joined to find unique values.
My problems is when I join these 2 tables, I'm getting values which I don't really want.
Sample values
Application Report
No appId ReportNo ExternalAppId
1 123 1 123
2 456 2 00000123
3 789 3 321
So when I say Application.appId = Report.ExternalAppId in my where condition, it is returning me rows of 123 and 00000123 from Report table . The leading zeros are not taken into account in the join. I need result with exact matches only. In this case, the first row alone.The cause of the issue I think is the appId is number and ExternalAppId is varchar. I cant change this also. Is there any workaround ? I have seen regex which can remove the leading zeros and then match, but just want to know if there is a better solution. ie can I specify that the join will work only for the values with exact match ?
Oracle can only compare two values of the same datatype. I can't stress this enough. In fact most languages can compare two values only if they are the same datatype. Relations in math will also be defined with objects of the same type (so that you can define transitivity, reflexivity...). There's also the saying with oranges and apples: don't try to compare them.
So when you ask Oracle to compare two values of different datatypes, an implicit conversion will take place. In most cases you should avoid these conversions, since the rules of which datatype will be chosen over the other can be quite complex and (like in this case) will often produce bugs because you will incorrectly guess which type will win. You should rely on explicit conversions (conversions that you specify).
I assume that Application (appId) is a NUMBER and Report (ExternalAppId) is of type VARCHAR2. In this case Oracle chose to convert ExternalAppId to a NUMBER, and in the NUMBER space, 00123=123 because numbers have no format.
You should have written instead your join condition as:
to_char(application.appId) = report.externalAppId