I want to make sure that this conversion of a DECODE function into a SELECT statement joining it to a mapping table would run properly and I'm not using coding or format that works in SQL Server but is different in Oracle SQL
About the code: it is using the DECODE function to map a series of four digit medical taxonomy codes into two digit provider specialty codes. The primary table is PRVDR.TXNMY_CD, the outcome would be a column PRFRM_PRVDR_SPCLTY_CD.
Original code:
SELECT
DECODE (SUBSTR(PRVDR.TXNMY_CD, 1, 4),
'261Q', '70','347E', '59','332H', '96','332B', 'A6','1711', 'Y9','2257', 'Y9','106H', '62','103K', '26','101Y', '26','367A', '42','207K', '03', '3416', '59','367H', '32','207L', '05','211D', '48','231H', '64','2376', '64','111N', '35','291U', '69','103G', '86','364S', '89','208C', '28', '172V', '60','251S'
) END AS PRFRM_PRVDR_SPCLTY_CD
FROM
NPS_CLM_HDR
My conversion attempt:
First, I'd separately create this table called MAPPING with the following columns
| TXNMY_CD_MAP | PRFRM_PRVDR_SPCLTY_CD |
| 1711 | Y9 |
| 2257 | Y9 |
| 106H | 62 |
| 367A | 42 |
etc.
Then I would use the following query:
SELECT PRFRM_PRVDR_SPCLTY_CD
FROM REF.MAPPING AS M
JOIN PRVDR.TXNMY_CD AS P ON P.TXNMY_CD = M.TXNMY_CD_MAP
Does this look correct or have I used terminology from SQL Server that does not work with Oracle SQL?
Hmmm . . . I am expecting the two columns to be:
TXNMY_CD4 PRFRM_PRVDR_SPCLTY_CD
'261Q' '70'
'347E' '59'
'332H' '96'
. . .
This may be what your table looks like, but these are the values at the beginning of the table.
And then:
SELECT m.PRFRM_PRVDR_SPCLTY_CD
FROM PRVDR.TXNMY_CD P LEFT JOIN
REF.MAPPING M
ON LEFT(P.TXNMY_CD, 4) = M.TXNMY_CD_MAP
Except for the LEFT() vs. SUBSTR(), this should work in both databases.
Note that this uses LEFT JOIN to ensure that no rows are lost, even if there are no matches.
decode() is an Oracle function, that is not available in SQL Server. If you wanted to translate your decode to SQL Server, you would use case:
case left(prvdr.txnmy_cd, 4)
when '261Q' then '70'
when '347E' then '59'
...
end as prfrm_prvdr_spclty_cd
from nps_clm_hdr
Note that substr() is not supported in SQL Server - here we cause left() instead.
That said, using a mapping table is a better approach: it scales better, and makes it easy to maintain the mapping (there is no need to modify the code of the query, just the data).
You would phrase the query as:
select prfrm_prvdr_spclty_cd
from prvdr.txnmy_cd as p
left join ref.mapping as m on left(p.txnmy_cd, 4) = m.txnmy_cd_map
The left join allows unmapped values.
Not all database support left(), nor substr() (it is sometimes called substring(), as in SQL Server). I think that the most portable approach uses like and concat():
left join ref.mapping as m on p.txnmy_cd like concat(m.txnmy_cd_map, '%')
Related
I'm working with a database that has a locations table such as:
locationID | locationHierarchy
1 | 0
2 | 1
3 | 1,2
4 | 1
5 | 1,4
6 | 1,4,5
which makes a tree like this
1
--2
----3
--4
----5
------6
where locationHierarchy is a csv string of the locationIDs of all its ancesters (think of a hierarchy tree). This makes it easy to determine the hierarchy when working toward the top of the tree given a starting locationID.
Now I need to write code to start with an ancestor and recursively find all descendants. MySQL has a function called 'find_in_set' which easily parses a csv string to look for a value. It's nice because I can just say "find in set the value 4" which would give all locations that are descendants of locationID of 4 (including 4 itself).
Unfortunately this is being developed on SQL Server 2014 and it has no such function. The CSV string is a variable length (virtually unlimited levels allowed) and I need a way to find all ancestors of a location.
A lot of what I've found on the internet to mimic the find_in_set function into SQL Server assumes a fixed depth of hierarchy such as 4 levels maximum) which wouldn't work for me.
Does anyone have a stored procedure or anything that I could integrate into a query? I'd really rather not have to pull all records from this table to use code to individually parse the CSV string.
I would imagine searching the locationHierarchy string for locationID% or %,{locationid},% would work but be pretty slow.
I think you want like -- in either database. Something like this:
select l.*
from locations l
where l.locationHierarchy like #LocationHierarchy + ',%';
If you want the original location included, then one method is:
select l.*
from locations l
where l.locationHierarchy + ',' like #LocationHierarchy + ',%';
I should also note that SQL Server has proper support for recursive queries, so it has other options for hierarchies apart from hierarchy trees (which are still a very reasonable solution).
Finally It worked for me..
SELECT * FROM locations WHERE locationHierarchy like CONCAT(#param,',%%') OR
o.unitnumber like CONCAT('%%,',#param,',%%') OR
o.unitnumber like CONCAT('%%,',#param)
I have a Hive table with the numeric version of an IP address. I have another table with start, end, location where start and end define a range of numeric IPs associated with a location.
Example
Numeric: 29
start | end | location
----------------------
1 | 11 | 666
12 | 30 | 777
31 | 40 | 888
Output: 29 - 777
I need to use the IP from table 1 to lookup the location from table 2. I'm new to Hive and have discovered that I can't use BETWEEN or < > in join statements. I've been trying to figure out some way of making this happen using Hive SQL and can't figure it out. Is there a way? I'm somewhat familiar with UDFs as well if one of those is needed. I'm open to the idea that this isn't possible in Hive and I need to do with Pig or a Java Map/Reduce job, I just don't know enough about things at this point to say.
Any help is appreciated. Thanks.
Hive and Pig do not support such inequality join. You can use cross join and where to do it. But it's inefficient. A simple example:
SELECT t1.ip, t2.location_ip FROM t1 JOIN t2
WHERE t1.ip >= t2.start_ip and t1.ip<=t2.start_ip ;
However, it seems you want to do cross join a big table and a small table. If so, maybe the following statement is more efficient:
SELECT /*+ MAPJOIN(t2) */ t1.ip, t2.location_ip FROM t1 JOIN t2
WHERE t1.ip >= t2.start_ip and t1.ip<=t2.start_ip ;
Suppose I have a SQL query like this:
SELECT
tickets.TicketNumber, history.remarks
FROM
AT_DeviceReplacement_Tickets tickets
INNER JOIN
AT_DeviceReplacement_Tickets_History history
ON tickets.TicketNumber = history.TicketNumber;
I get a table like this in repsonse:
ticketNumber | remarks
-------------+------------
1 | "Hello, there is a problem."
1 | "Did you check the power cable?
1 | "We plugged it in and now it works. Thank you!"
2 | "Hello, this is a new ticket."
Suppose that I want to write a query that will concatenate the remarks for each ticket and return a table like this:
ticketNumber | remarks
-------------+------------
1 | "Hello, there is a problem.Did you check the power cable?We plugged it in and now it works. Thank you!"
2 | "Hello, this is a new ticket."
Yes, in the real code, I've actually got these sorted by date, among other things, but just for the sake of discussion, how would I edit the above query to get the result I described?
Have a look at the following questions:
Can I Comma Delimit Multiple Rows Into One Column?
Is it possible to concatenate column values into a string using CTE?
The cleanest solution to this problem is DB dependent. Lentine's links show very ugly solutions for Oracle and SQL Server and a clean one for MySQL. The answer in PostgreSQL is also very short and easy.
SELECT ticket_number, string_agg(remarks, ', ')
FROM
AT_DeviceReplacement_Tickets tickets
INNER JOIN
AT_DeviceReplacement_Tickets_History history
ON tickets.Ticket_Number = history.Ticket_Number
GROUP BY tickets.ticket_number;
(Note you have both ticket_number and TicketNumber in your sample code.)
My guess is that Oracle and SQL Server either (1) have a similar aggregate function or (2) have the capability of defining your own aggregate functions. [For MySQL the equivalent aggregate is called GROUP_CONCAT.] What DB are you using?
Basically, I'm dealing with a horribly set up table that I'd love to rebuild, but am not sure I can at this point.
So, the table is of addresses, and it has a ton of similar entries for the same address. But there are sometimes slight variations in the address (i.e., a room # is tacked on IN THE SAME COLUMN, ugh).
Like this:
id | place_name | place_street
1 | Place Name One | 1001 Mercury Blvd
2 | Place Name Two | 2388 Jupiter Street
3 | Place Name One | 1001 Mercury Blvd, Suite A
4 | Place Name, One | 1001 Mercury Boulevard
5 | Place Nam Two | 2388 Jupiter Street, Rm 101
What I would like to do is in SQL (this is mssql), if possible, is do a query that is like:
SELECT DISTINCT place_name, place_street where [the first 4 letters of the place_name are the same] && [the first 4 characters of the place_street are the same].
to, I guess at this point, get:
Plac | 1001
Plac | 2388
Basically, then I can figure out what are the main addresses I have to break out into another table to normalize this, because the rest are just slight derivations.
I hope that makes sense.
I've done some research and I see people using regular expressions in SQL, but a lot of them seem to be using C scripts or something. Do I have to write regex functions and save them into the SQL Server before executing any regular expressions?
Any direction on whether I can just write them in SQL or if I have another step to go through would be great.
Or on how to approach this problem.
Thanks in advance!
Use the SQL function LEFT:
SELECT DISTINCT LEFT(place_name, 4)
I don't think you need regular expressions to get the results you describe. You just want to trim the columns and group by the results, which will effectively give you distinct values.
SELECT left(place_name, 4), left(place_street, 4), count(*)
FROM AddressTable
GROUP BY left(place_name, 4), left(place_street, 4)
The count(*) column isn't necessary, but it gives you some idea of which values might have the most (possibly) duplicate address rows in common.
I would recommend you look into Fuzzy Search Operations in SQL Server. You can match the results much better than what you are trying to do. Just google sql server fuzzy search.
Assuming at least SQL Server 2005 for the CTE:
;with cteCommonAddresses as (
select left(place_name, 4) as LeftName, left(place_street,4) as LeftStreet
from Address
group by left(place_name, 4), left(place_street,4)
having count(*) > 1
)
select a.id, a.place_name, a.place_street
from cteCommonAddresses c
inner join Address a
on c.LeftName = left(a.place_name,4)
and c.LeftStreet = left(a.place_street,4)
order by a.place_name, a.place_street, a.id
I need to keep track of different dates (dynamic). So for a specific Task you could have X number of dates to track (for example DDR1 meeting date, DDR2 meeting date, Due Date, etc).
My strategy was to create one table (DateTypeID, DateDescription) which would store the description of each date. Then I could create the main table (ID, TaskDescription, DateTypeID). So all the dates would be in one column and you could tell what that date represents by looking at the TypeID. The problem is displaying it in a grid. I know I should use a cross tab query, but i cannot get it to work. For example, I use a Case statement in SQL Server 2000 to pivot the table over so that each column name is the name of the date type. IF we have the following tables:
DateType Table
DateTypeID | DateDescription
1 | DDR1
2 | DDR2
3 | DueDate
Tasks Table
ID | TaskDescription
1 | Create Design
2 | Submit Paperwork
Tasks_DateType Table
TasksID | DateTypeID | Date
1 | 1 | 09/09/2009
1 | 2 | 10/10/2009
2 | 1 | 11/11/2009
2 | 3 | 12/12/2009
THE RESULT SHOULD BE:
TaskDescription | DDr1 | DDR2 | DueDate
Create Design |09/09/2009 | 10/10/2009 | null
Submit Paperwork |11/11/2009 | null | 12/12/2009
IF anyone has any idea how I can go about researching this, I appreciate it. The reason I do this instead of making a column for each date, has to do with the ability to let the user in the future add as many dates as they want without having to manually add columns to the table and editing html code. This also allows simple code for comparing dates or show upcoming tasks by their type (ex. 'Create design's DDR1 date is coming up' ) If anyone can point me in the right direction, I appreciate it.
Here is a proper answer, tested with your data. I only used the first two date types, but you'd build this up on the fly anyway.
Select
Tasks.TaskDescription,
Min(Case DateType.DateDescription When 'DDR1' Then Tasks_DateType.Date End) As DDR1,
Min(Case DateType.DateDescription When 'DDR2' Then Tasks_DateType.Date End) As DDR2
From
Tasks_DateType
INNER JOIN Tasks ON Tasks_DateType.TaskID = Tasks.TaskID
INNER JOIN DateType ON Tasks_DateType.DateTypeID = DateType.DateTypeID
Group By
Tasks.TaskDescription
EDIT
van mentioned that tasks with no dates won't show up. This is correct. Using left joins (again, mentioned by van) and restructuring the query a bit will return all tasks, even though this is not your need at the moment.
Select
Tasks.TaskDescription,
Min(Case DateType.DateDescription When 'DDR1' Then Tasks_DateType.Date End) As DDR1,
Min(Case DateType.DateDescription When 'DDR2' Then Tasks_DateType.Date End) As DDR2
From
Tasks
LEFT OUTER JOIN Tasks_DateType ON Tasks_DateType.TaskID = Tasks.TaskID
LEFT OUTER JOIN DateType ON Tasks_DateType.DateTypeID = DateType.DateTypeID
Group By
Tasks.TaskDescription
If the pivoted columns are unknown (dynamic), then you'll have to build up your query manually in either ms-sql 2000 or 2005, ie with out without PIVOT.
This involves either executing dynamic sql in a stored procedure (generally a no-no) or querying a view with dynamic sql. The latter is the approach I generally go with.
For pivoting, I prefer the Rozenshtein method over case statements, as explained here:
http://www.stephenforte.net/PermaLink.aspx?guid=2b0532fc-4318-4ac0-a405-15d6d813eeb8
EDIT
You can also do this in linq-to-sql, but it emits some pretty inefficient code (at least when I view it through linqpad), so I don't recommend it. If you're still curious I can post an example of how to do it.
I don't have personal experience with the pivot operator, it may provide a better solution.
But I've used a case statement in the past
SELECT
TaskDescription,
CASE(DateTypeID = 1, Tasks_DateType.Date) AS DDr1,
CASE(DateTypeID = 2, Tasks_DateType.Date) AS DDr2,
...
FROM Tasks
INNER JOIN Tasks_DateType ON Tasks.ID = Tasks_DateType.TasksID
INNER JOIN DateType ON Tasks_DateType.DateTypeID = DateType.DateTypeID
GROUP BY TaskDescription
This will work, but will require you to change the SQL whenever there are more Task descriptions added, so it's not ideal.
EDIT:
It appears as though the PIVOT keyword was added in SqlServer 2005, this example shows how to do a pivot query in both 2000 & 2005, but it is similar to my answer.
Version-1: +simple, -must be changed every time DateType is added. So is not great for a dynamic solution:
SELECT tt.ID,
tt.TaskDescription,
td1.Date AS DDR1,
td2.Date AS DDR2,
td3.Date AS DueDate
FROM Tasks tt
LEFT JOIN Tasks_DateType td1
ON td1.TasksID = tt.ID AND td1.DateTypeID = 1
LEFT JOIN Tasks_DateType td2
ON td2.TasksID = tt.ID AND td2.DateTypeID = 2
LEFT JOIN Tasks_DateType td3
ON td3.TasksID = tt.ID AND td3.DateTypeID = 3
Version-2: completely dynamic (with some limitations, but they can be handled - just google for it):
Dynamic pivot query creation. See Dynamic Cross-Tabs/Pivot Tables: you need to create one SP of UDF and then can use it for multiple purposes. This is the original post, to which you may find many links and improvements.
Version-3: just leave it for your client code to handle. I would not design my SQL to return a dynamic set of data, but rather handle it on the client (presentation layer). I just would not like to handle some dynamic columns that come as a result of my query, where I need to guess what is that exactly. The only reason I use Version-2 is when the result is presented directly as a table for a report. In all other cases for truly dynamic data I use client code. For example: having structure you have, how will you attach logic that field DueDate is mandatory - you cannot use DB constraints; how will you ensure that DDR1 is not higher then DDR2? If these are not separate (static) columns in the database (where you can use CONSTRAINTS), then the client code is the one that validates your data consistency.
Good luck!