SQL Lookup table where there are multiple incoming codes - sql

SQL Lookup
I need to build a lookup table that will allow for multiple ‘match from’ possibilities
Ie
Find a text code based on an incoming text from say Ohio or Vermont, having the possibility of others later. I also need history so if an Ohio code changes that can still be found by date without interfering with the current active code.
txtCode | OhioCode | VACode| … future expansion
100A | 567BR | Thing |
100B | 4FJEU | 54DS |
I could use a single table but that doesn’t seem very efficient. With multiple tables, one for each state, future expansion seems more complicated but perhaps that is the way to go? Whould I use a table to some how lookup tables?
So what are the best practices for doing something like this?

A normalized approach would look something more like this. What I don't really understand though is your concept of "currently active code". Not sure what that means in relation to the data posted.
create table CodeLookup
(
txtCode varchar(10) not null
, CodeValue varchar(10) not null
, StateCode char(2) not null
, DateCreated datetime not null
)
insert CodeLookup values
('100A', '567BR', 'OH', getdate())
, ('100A', 'Thing', 'VA', getdate())
, ('100B', '4FJEU', 'OH', getdate())
, ('100B', '54DS', 'VA', getdate())
select *
from CodeLookup

What is txtCode?
The most standard way to do this is a normalized lookup with effective date timestamps.
Txt_Code | State_Code | State_Value | Rec_Strt_Dt | Rec_End_Dt | Current_Flag
100A OH 567BR 12/1/2000 12/03/9999 N
100A OH NewValue 12/3/2000 12/31/9999 Y
100A VA Thing 12/1/2000 12/31/9999 Y
100B OH 4FJEU 12/1/2000 12/31/9999 Y
100B VA 54DS 12/1/2000 12/31/9999 Y
It really does depend on the type of queries you'll be running though.
(and you'll likely want timestamps offset by 1 second for the effective dates)
Then you can index on state_code, txt_code + state_code, current_flag, etc...
Depending on what you're doing.

Related

SQL SPLIT with New Row

I have an MS Sql databse where I have following data:
ID | ORD_No | Date | User | Note
-----+--------------+------------+------------+---------------
1 | 18/UT00120/ZS| | |---- Saved 10/10/2020 14:08 by John Snow, rest of the note
----Saved on 11/11/2020 13:09 by Mike Kowalsky, rest of the
note ---- Saved on 12/11/2020 11:00 by Barbara Smith, rest of the note
From that I want to create following output:
ID | ORD_No | Date | User | Note
-----+--------------+------------+----------------+---------------
1 | 18/UT00120/ZS| 10/10/2020 | John Snow | rest of the note
-----+--------------+------------+----------------+---------------
2 | 18/UT00120/ZS| 11/11/2020 | Mike Kowalsky | rest of the note
-----+--------------+------------+----------------+---------------
3 | 18/UT00120/ZS| 12/11/2020 | Barbara Smith | rest of the note
Please adivce how can I achive required output.
Thanks!
SQL Server does not have very good string processing functionality. You can do this but it is rather painful -- and not going to be flexible for all the variations on what notes might look like.
One big issue is that the built-in string_split() function does not take multi-character delimiters. The following chooses a character that is not likely to be in the notes.
Also, the leading prefix is not consistent -- something there is an "on" and sometimes not. So, this doesn't attempt to extract the "rest of the string". It leaves in the prefix. You could use additional string manipulations to handle this, but I suspect the real problem is more complex.
In any case, this comes quite close to what you want:
select t.id, t.ord_no, trim(s.value), s2.value as date
from t cross apply
string_split(replace(note, '----', '~'), '~') s cross apply
(select top (1) s2.value
from string_split(s.value, ' ') s2
where try_convert(date, s2.value, 101) >= '2000-01-01'
) s2;
Here is a db<>fiddle.
Note that the date inequality is used because select try_convert(date, '') returns '1900-01-01' rather than NULL as I would expect.
I think, I have a solution for you. However, in different scenario it might not work. I have used SUBSTRING,CHARINDEX,STRING_SPLIT,REPLACE and CAST to achieve your desire answer. Here is my code given below=>
DECLARE #MyTable Table (ID INT, ORD_No VARCHAR(100),Note VARCHAR(300));
INSERT INTO #MyTable VALUES(1,'18/UT00120/ZS','Saved on 10/10/2020 14:08 by John Snow, rest of the note');
INSERT INTO #MyTable VALUES(2,'18/UT00120/ZS','Saved on 11/11/2020 07:08 by Mike Kowalsky, rest of the note');
INSERT INTO #MyTable VALUES(3,'18/UT00120/ZS','Saved on 12/11/2020 16:08 by Barbara Smith, rest of the note');
Select ID,ORD_No ,CAST(substring(Note,9,17) AS DATE) [Date],
(SELECT top 1 value FROM STRING_SPLIT(SUBSTRING(Note,29,CHARINDEX(',',Note,0)),',')) AS [USER],
RIGHT(REPLACE(SUBSTRING(Note, CHARINDEX(',', Note), LEN(Note)), '', ''), len(REPLACE(SUBSTRING(Note, CHARINDEX(',', Note), LEN(Note)), '', ''))-1) AS NOTE
FROM #MyTable
Note: This code will only work if your Note column data is always in same format as you gave in your question. Check also db-fiddle Link.

SQL Query syntax: break-out column values

As an example, lets say my dataset holds:
EMPLOYEE_ID
EMPLOYEE_NAME
EMPLOYEE_ACCT_ID
EMPLOYEE_ACCT_TYPE
EMPLOYEE_ACCT_BALANCE
I would like to present the data in the following way:
EMPLOYEE | CHECKING | SAVINGS | INVESSTMENT | XMAS |
_______________________________________________________________________
Mary | 100.00 | 700.00 | 3,000.00 | 175.00
Jim | 850.00 | 600.00 | 1,500.00 | 0.00
TOTAL | 950.00 | 1,300.00 | 4,500.00 | 175.00
Where I'm stuck is how to break out the EMPLOYEE_ACCT_TYPE into columns with each account type values listed with it's balance. Thanks in advance.
What you are trying to do is a called a Pivot. Some systems (e.g. SQL Server) have native support for this in SQL, but only if you know the number of columns in advance (i.e. you would have to hard-code the account types into the SQL). Other systems don't support pivoting natively (e.g. MySQL) so you would need to write a stored procedure or some dynamic SQL to do it.
Since you don't mention what DBMS you are using, that's about as specific as I can get.
It sounds to me like you need to do some serious normalization, first. Break employee_account data out into its own table:
table: employee_account_data
EMPLOYEE_ACCT_ID int
EMPLOYEE_ACCT_TYPE varchar(15)
EMPLOYEE_ACCT_BALANCE decimal
You'll also need a bridge table, since many employees can have many accounts (many to many):
table: employee_account_lookup
EMPLOYEE_ID int
EMPLOYEE_ACCT_ID int
This way, you won't be repeating employee_name for each account type (as I suspect you are now). If you really wanted to normalize well, you could also create a table to hold the different Employee Account Types. That way you wouldn't have to worry about someone mispelling "Checking" or "Savings" on data entry.

Optimal solution for interview question

Recently in a job interview, I was given the following problem.
Say I have the following table
widget_Name | widget_Costs | In_Stock
---------------------------------------------------------
a | 15.00 | 1
b | 30.00 | 1
c | 20.00 | 1
d | 25.00 | 1
where widget_name is holds the name of the widget, widget_costs is the price of a widget, and in stock is a constant of 1.
Now for my business insurance I have a certain deductible. I am looking to find a sql statement that will tell me every widget and it's price exceeds the deductible. So if my dedudctible is $50.00 the above would just return
widget_Name | widget_Costs | In_Stock
---------------------------------------------------------
a | 15.00 | 1
d | 25.00 | 1
Since widgets b and c where used to meet the deductible
The closest I could get is the following
SELECT
*
FROM (
SELECT
widget_name,
widget_price
FROM interview.tbl_widgets
minus
SELECT widget_name,widget_price
FROM (
SELECT
widget_name,
widget_price,
50 - sum(widget_price) over (ORDER BY widget_price ROWS between unbounded preceding and current row) as running_total
FROM interview.tbl_widgets
)
where running_total >= 0
)
;
Which gives me
widget_Name | widget_Costs | In_Stock
---------------------------------------------------------
c | 20.00 | 1
d | 25.00 | 1
because it uses a and b to meet the majority of the deductible
I was hoping someone might be able to show me the correct answer
EDIT: I understood the interview question to be asking this. Given a table of widgets and their prices and given a dollar amount, substract as many of the widgets you can up to the dollar amount and return those widgets and their prices that remain
I'll put an answer up, just in case it's easier than it looks, but if the idea is just to return any widget that costs more than the deductible then you'd do something like this:
Select
Widget_Name, Widget_Cost, In_Stock
From
Widgets
Where
Widget_Cost > 50 -- SubSelect for variable deductibles?
For your sample data my query returns no rows.
I believe I understand your question, but I'm not 100%. Here is what I'm assuming you mean:
Your deductible is say, $50. To meet the deductible you have you "use" two items. (Is this always two? How high can it go? Can it be just one? What if they don't total exactly $50, there is a lot of missing information). You then want to return the widgets that aren't being used towards deductible. I have the following.
CREATE TABLE #test
(
widget_name char(1),
widget_cost money
)
INSERT INTO #test (widget_name, widget_cost)
SELECT 'a', 15.00 UNION ALL
SELECT 'b', 30.00 UNION ALL
SELECT 'c', 20.00 UNION ALL
SELECT 'd', 25.00
SELECT * FROM #test t1
WHERE t1.widget_name NOT IN (
SELECT t1.widget_name FROM #test t1
CROSS JOIN #test t2
WHERE t1.widget_cost + t2.widget_cost = 50 AND t1.widget_name != t2.widget_name)
Which returns
widget_name widget_cost
----------- ---------------------
a 15.00
d 25.00
This looks like a Bin Packing problem these are really hard to solve especially with SQL.
If you search on SO for Bin Packing + SQL, you'll find how to find Sum(field) in condition ie “select * from table where sum(field) < 150” Which is basically the same problem except you want to add a NOT IN to it.
I couldn't get the accepted answer by brianegge to work but what he wrote about it in general was interesting
..the problem you
describe of wanting the selection of
users which would most closely fit
into a given size, is a bin packing
problem. This is an NP-Hard problem,
and won't be easily solved with ANSI
SQL. However, the above seems to
return the right result, but in fact
it simply starts with the smallest
item, and continues to add items until
the bin is full.
A general, more effective bin packing
algorithm would is to start with the
largest item and continue to add
smaller ones as they fit. This
algorithm would select users 5 and 4.
So with this advice you could write a cursor to loop over the table to do just this (it just wouldn't be pretty).
Aaron Alton gives a nice link to a series of articles that attempts to solve the Bin Packing problem with sql but basically concludes that its probably best to use a cursor to do it.

Substitute MySQL result

I'm getting the following data from a MySQL database
+----------------+------------+---------------------+----------+
| account_number | total_paid | doc_date | doc_type |
+----------------+------------+---------------------+----------+
| 18 | 54.0700 | 2009-10-22 02:37:09 | IN |
| 425 | 49.9500 | 2009-10-22 02:31:47 | PO |
+----------------+------------+---------------------+----------+
The query is fine and I'm getting the data I need except that the doc_type isn't very human readable. To fix this, I've done the following
CREATE TEMPORARY TABLE doc_type (id char(2), string varchar(60));
INSERT INTO doc_type VALUES
('IN', 'Invoice'),
('PO', 'Online payment'),
('PF', 'Offline payment'),
('CA', 'Credit adjustment'),
('DA', 'Debit adjustment'),
('OR', 'Order');
I then add a join against this temporary table so my doc_type column is easier to read which looks like this
+----------------+------------+---------------------+----------------+
| account_number | total_paid | doc_date | document_type |
+----------------+------------+---------------------+----------------+
| 18 | 54.0700 | 2009-10-22 02:37:09 | Invoice |
| 425 | 49.9500 | 2009-10-22 02:31:47 | Online payment |
+----------------+------------+---------------------+----------------+
Is this the best way to do this? Is it possible to replace the text in one query? I started looking at if statements but it doesn't seem to be what I'm after or maybe I just read it incorrectly.
// EDIT //
Thanks everyone. I suppose I'll keep doing it this way.
Unfortunately, it's not possible to change doc_type to integer as this is an existing database for a billing application I didn't write. I'd end up breaking functionality if I made any changes other than adding a table here and there.
Also appreciate the easy to understand case statement from Rahul. May come in handy later.
Your current way is the best. Arguably, document_type can be changed to an int, to save space and whatnot, but that's irrelevant.
Doing the join will be much faster and readable than any chained ifs.
Not to mention, extensible. Should you need to add a new doc_type, it's just an insert vs. potentially several queries.
You can use the SQL CASE statement to do this in a single query.
Select account_number, total_paid, doc_date,
case doctype
when 'IN' then 'Invoice'
when 'PO' then 'Online Payment'
end
from table
It is the best way to do this :)
If doc_type could be an integer, you also can use ELT function, as in
SELECT ELT(doc_type, 'Invoice', 'Document') FROM table;
but it is still worse than simple join as you have to put this thing into every query and every application that using the database, and changing description becomes a hell.
IIRC this is the correct way to achieve what you want to do. It's a normalized design
I think you are asking about the design and not how the data has to be fetched? If it is so, then I should tell I have always used the above kind of design.
This design leads to normalized database. There won't be consistency problems if you ever needed to change the name of the field like Invoice and Online Payment
I would suggest you to change doc_type field to int as not only it saves space(as told by Tordek) but it is also faster when you execute queries.
Firstly.If you used Invoice in doct_type as string, then the problems could have been was that string search is extremely slow when compared to other datatypes.
Second, it is case sensitive (which may lead to mistakes.
Thirdly, since string takes up much space, so much more space is required for storing it in the main table.
Fourth, If you ever required to change the name Invoice to say Billing, then searching for Invoice would take time and each and every row containing this value had to be updated

Need a Complex SQL Query

I need to make a rather complex query, and I need help bad. Below is an example I made.
Basically, I need a query that will return one row for each case_id where the type is support, status start, and date meaning the very first one created (so that in the example below, only the 2/1/2009 John's case gets returned, not the 3/1/2009). The search needs to be dynamic to the point of being able to return all similar rows with different case_id's etc from a table with thousands of rows.
There's more after that but I don't know all the details yet, and I think I can figure it out if you guys (an gals) can help me out here. :)
ID | Case_ID | Name | Date | Status | Type
48 | 450 | John | 6/1/2009 | Fixed | Support
47 | 450 | John | 4/1/2009 | Moved | Support
46 | 451 | Sarah | 3/1/2009 | |
45 | 432 | John | 3/1/2009 | Fixed | Critical
44 | 450 | John | 3/1/2009 | Start | Support
42 | 450 | John | 2/1/2009 | Start | Support
41 | 440 | Ben | 2/1/2009 | |
40 | 432 | John | 1/1/2009 | Start | Critical
...
Thanks a bunch!
Edit:
To answer some people's questions, I'm using SQL Server 2005. And the date is just plain date, not string.
Ok so now I got further in the problem. I ended up with Bliek's solution which worked like a charm. But now I ran into the problem that sometimes the status never starts, as it's solved immediately. I need to include this in as well. But only for a certain time period.
I imagine I'm going to have to check for the case table referenced by FK Case_ID here. So I'd need a way to check for each Case_ID created in the CaseTable within the past month, and then run a search for these in the same table and same manner as posted above, returning only the first result as before. How can I use the other table like that?
As usual I'll try to find the answer myself while waiting, thanks again!
Edit 2:
Seems this is the answer. I don't have access to the full DB yet so I can't fully test it, but it seems to be working with the dummy tables I created, to continue from Bliek's code's WHERE clause:
WHERE RowNumber = 1 AND Case_ID IN (SELECT Case_ID FROM CaseTable
WHERE (Date BETWEEN '2007/11/1' AND '2007/11/30'))
The date's screwed again but you get the idea I'm sure. Thanks for the help everyone! I'll get back if there're more problems, but I think with this info I can improvise my way through most of the SQL problems I currently have to deal with. :)
Maybe something like:
select Case_ID, Name, MIN(date), Status, Type
from table
where type = 'Support'
and status = 'Start'
group by Case_ID, Name, Status, Type
EDIT: You haven't provided a lot of details about what you really want, so I'd suggest that you read all the answers and choose one that suits your problem best. So far I'd say that Tomalak's answer is closest to what you're looking for...
SELECT
c.ID,
c.Case_ID,
c.Name,
c.Date,
c.Status,
c.Type
FROM
CaseTable c
WHERE
c.Type = 'Support'
AND c.Status = 'Start'
AND c.Date = (
SELECT MIN(Date)
FROM CaseTable
WHERE Case_ID = c.Case_ID AND Type = c.Type AND Status = c.Status)
/* GROUP BY only needed when for a given Case_ID several rows
exist that fulfill the WHERE clause */
GROUP BY
c.ID,
c.Case_ID,
c.Name,
c.Date,
c.Status,
c.Type
This query benefits greatly from indexes on the Case_ID, Date, Status and Type columns.
Added value though the fact that the filter on Support and Status only needs to be set in one place.
As an alternative to the GROUP BY clause, you can do SELECT DISTINCT, which would increase readability (this may or may not affect overall performance, I suggest you measure both variants against each other). If you are sure that for no Case_ID in your table two rows exist that have the same Date, you won't need GROUP BY or SELECT DISTINCT at all.
In SQL Server 2005 and beyond I would use Common Table Expressions (CTE). This offers lots of possibilities like so:
With ResultTable (RowNumber
,ID
,Case_ID
,Name
,Date
,Status
,Type)
AS
(
SELECT Row_Number() OVER (PARTITION BY Case_ID
ORDER BY Date ASC)
,ID
,Case_ID
,Name
,Date
,Status
,Type
FROM CaseTable
WHERE Type = 'Support'
AND Status = 'Start'
)
SELECT ID
,Case_ID
,Name
,Date
,Status
,Type
FROM ResultTable
WHERE RowNumber = 1
Don't apologize for your date formatting, it makes more sense that way.
SELECT ID, Case_ID, Name, MIN(Date), Status, Type
FROM caseTable
WHERE Type = 'Support'
AND status = 'Start'
GROUP BY ID, Case_ID, Name, Status, Type