Cross Tab - Storing different dates (Meeting1, Meeting2, Meeting 3 etc) in the same column - sql

I need to keep track of different dates (dynamic). So for a specific Task you could have X number of dates to track (for example DDR1 meeting date, DDR2 meeting date, Due Date, etc).
My strategy was to create one table (DateTypeID, DateDescription) which would store the description of each date. Then I could create the main table (ID, TaskDescription, DateTypeID). So all the dates would be in one column and you could tell what that date represents by looking at the TypeID. The problem is displaying it in a grid. I know I should use a cross tab query, but i cannot get it to work. For example, I use a Case statement in SQL Server 2000 to pivot the table over so that each column name is the name of the date type. IF we have the following tables:
DateType Table
DateTypeID | DateDescription
1 | DDR1
2 | DDR2
3 | DueDate
Tasks Table
ID | TaskDescription
1 | Create Design
2 | Submit Paperwork
Tasks_DateType Table
TasksID | DateTypeID | Date
1 | 1 | 09/09/2009
1 | 2 | 10/10/2009
2 | 1 | 11/11/2009
2 | 3 | 12/12/2009
THE RESULT SHOULD BE:
TaskDescription | DDr1 | DDR2 | DueDate
Create Design |09/09/2009 | 10/10/2009 | null
Submit Paperwork |11/11/2009 | null | 12/12/2009
IF anyone has any idea how I can go about researching this, I appreciate it. The reason I do this instead of making a column for each date, has to do with the ability to let the user in the future add as many dates as they want without having to manually add columns to the table and editing html code. This also allows simple code for comparing dates or show upcoming tasks by their type (ex. 'Create design's DDR1 date is coming up' ) If anyone can point me in the right direction, I appreciate it.

Here is a proper answer, tested with your data. I only used the first two date types, but you'd build this up on the fly anyway.
Select
Tasks.TaskDescription,
Min(Case DateType.DateDescription When 'DDR1' Then Tasks_DateType.Date End) As DDR1,
Min(Case DateType.DateDescription When 'DDR2' Then Tasks_DateType.Date End) As DDR2
From
Tasks_DateType
INNER JOIN Tasks ON Tasks_DateType.TaskID = Tasks.TaskID
INNER JOIN DateType ON Tasks_DateType.DateTypeID = DateType.DateTypeID
Group By
Tasks.TaskDescription
EDIT
van mentioned that tasks with no dates won't show up. This is correct. Using left joins (again, mentioned by van) and restructuring the query a bit will return all tasks, even though this is not your need at the moment.
Select
Tasks.TaskDescription,
Min(Case DateType.DateDescription When 'DDR1' Then Tasks_DateType.Date End) As DDR1,
Min(Case DateType.DateDescription When 'DDR2' Then Tasks_DateType.Date End) As DDR2
From
Tasks
LEFT OUTER JOIN Tasks_DateType ON Tasks_DateType.TaskID = Tasks.TaskID
LEFT OUTER JOIN DateType ON Tasks_DateType.DateTypeID = DateType.DateTypeID
Group By
Tasks.TaskDescription

If the pivoted columns are unknown (dynamic), then you'll have to build up your query manually in either ms-sql 2000 or 2005, ie with out without PIVOT.
This involves either executing dynamic sql in a stored procedure (generally a no-no) or querying a view with dynamic sql. The latter is the approach I generally go with.
For pivoting, I prefer the Rozenshtein method over case statements, as explained here:
http://www.stephenforte.net/PermaLink.aspx?guid=2b0532fc-4318-4ac0-a405-15d6d813eeb8
EDIT
You can also do this in linq-to-sql, but it emits some pretty inefficient code (at least when I view it through linqpad), so I don't recommend it. If you're still curious I can post an example of how to do it.

I don't have personal experience with the pivot operator, it may provide a better solution.
But I've used a case statement in the past
SELECT
TaskDescription,
CASE(DateTypeID = 1, Tasks_DateType.Date) AS DDr1,
CASE(DateTypeID = 2, Tasks_DateType.Date) AS DDr2,
...
FROM Tasks
INNER JOIN Tasks_DateType ON Tasks.ID = Tasks_DateType.TasksID
INNER JOIN DateType ON Tasks_DateType.DateTypeID = DateType.DateTypeID
GROUP BY TaskDescription
This will work, but will require you to change the SQL whenever there are more Task descriptions added, so it's not ideal.
EDIT:
It appears as though the PIVOT keyword was added in SqlServer 2005, this example shows how to do a pivot query in both 2000 & 2005, but it is similar to my answer.

Version-1: +simple, -must be changed every time DateType is added. So is not great for a dynamic solution:
SELECT tt.ID,
tt.TaskDescription,
td1.Date AS DDR1,
td2.Date AS DDR2,
td3.Date AS DueDate
FROM Tasks tt
LEFT JOIN Tasks_DateType td1
ON td1.TasksID = tt.ID AND td1.DateTypeID = 1
LEFT JOIN Tasks_DateType td2
ON td2.TasksID = tt.ID AND td2.DateTypeID = 2
LEFT JOIN Tasks_DateType td3
ON td3.TasksID = tt.ID AND td3.DateTypeID = 3
Version-2: completely dynamic (with some limitations, but they can be handled - just google for it):
Dynamic pivot query creation. See Dynamic Cross-Tabs/Pivot Tables: you need to create one SP of UDF and then can use it for multiple purposes. This is the original post, to which you may find many links and improvements.
Version-3: just leave it for your client code to handle. I would not design my SQL to return a dynamic set of data, but rather handle it on the client (presentation layer). I just would not like to handle some dynamic columns that come as a result of my query, where I need to guess what is that exactly. The only reason I use Version-2 is when the result is presented directly as a table for a report. In all other cases for truly dynamic data I use client code. For example: having structure you have, how will you attach logic that field DueDate is mandatory - you cannot use DB constraints; how will you ensure that DDR1 is not higher then DDR2? If these are not separate (static) columns in the database (where you can use CONSTRAINTS), then the client code is the one that validates your data consistency.
Good luck!

Related

Select specific grouped element

Hi I would like to ask you because I cannot find solution.
For example I have data like that:
number | date | user
10 | 2022-07-01 | A
15 | 2022-07-08 | A
9 | 2022-07-10 | A
Right now I need get the number for user where date is the newer one.
In this case I need get value 9
Ofcourse I have many diffrent users it is only for illustrate the issue.
Now I would like to select all unique users with his number that date is the newer one.
Is it possible to do it in one query?
I like to teach these problems by breaking it down into smaller problems.
First, write a query that tells you which is the "newer one" for each user.
SELECT user, max(date) as newer_one
FROM tbl
GROUP BY user
Next, you can join that result back to the original data. My favorite style is called a CTE which makes things readable and easier to debug. Like so,
WITH newest AS (
SELECT user, max(date) as newer_one
FROM tbl
GROUP BY user
)
SELECT original.*
FROM tbl AS original
INNER JOIN newest
ON original.user = newest.user
AND original.date = newest.newer_one
Some RDBMS don't support this CTE style, but you can put the query in the body which makes it harder to read sometimes but will work basically anywhere.
SELECT original.*
FROM tbl AS original
INNER JOIN
(
SELECT user, max(date) as newer_one
FROM tbl
GROUP BY user
) as newest
ON original.user = newest.user
AND original.date = newest.newer_one

How do you 'join' multiple SQL data sets side by side (that don't link to each other)?

How would I go about joining results from multiple SQL queries so that they are side by side (but unrelated)?
The reason I am thinking of this is so that I can run 1 query in Google Big Query and it will return 1 single table which I can import into Excel and do some charts.
e.g. Query 1 looks at dataset TableA and returns:
**Metric:** Sales
**Value:** 3,402
And then Query 2 looks at dataset TableB and returns:
**Name:** John
**DOB:** 13 March
They would both use different tables and different filters, etc.
What would I do to make it look like:
---Sales----------John----
---3,402-------13 March----
Or alternatively:
-----Sales--------3,402-----
-----John-------13 March----
Or is there a totally different way to do this?
I can see the use case for the above, I've used something similar to create a single table from multiple tables with different metrics to query in Data Studio so that filters apply to all data in the dataset for example. However in that case, the data did share some dimensions that made it worthwhile doing.
If you are going to put those together with no relationship between the tables, I'd have 4 columns with TYPE describing the data in that row to make for easier filtering.
Type | Sales | Name | DOB
Use UNION ALL to put the rows together so you have something like
"Sales" | 3402 | null | null
"Customer Details" | null | John | 13 March
However, like the others said, make sure you have a good reason to do that otherwise you're just creating a bigger table to query for no reason.

Why does ROW_NUMBER in a view not respect filters

I am using SQL Server 2014.
I have created a view which surfaces patient history answers such as tabaco use, alcohol use, etc. These answers are time stamped but not linked to the appointment identifiers, and I need to find the most recent answer relative to the appointment date.
The data the view surfaces looks like this:
PATIENT_ID | ANSWER_DATE | TOBACCO_USE
1 | 1/1/2018 | No
1 | 1/5/2018 | Yes
1 | 1/10/2018 | Quit
2 | 1/1/2018 | No
I know I can use ROW_NUMBER() in a inline query when I join to this table to get the ranking I need, but I really want to add ROW_NUMBER()OVER(PARTITION BY PATIENT_ID ORDER BY ANSWER_DATE DESC) as 'rnkDesc' column to the view, to make it simpler for other developers to properly join to this table.
With this new column a SELECT * from the view looks like this:
PATIENT_ID | ANSWER_DATE | TOBACCO_USE | rnkDesc
1 | 1/1/2018 | No | 3
1 | 1/5/2018 | Yes | 2
1 | 1/10/2018 | Quit | 1
2 | 1/1/2018 | No | 1
That is as expected, now I join from my appointments table like so:
FROM APPOINTMENTS appt
LEFT JOIN myHistoryView his
on appt.PATIENT_ID = his.PATIENT_ID
and his.ANSWER_DATE <= APPT.APPT_DATE
and his.rnkDesc = 1
This does not work though, as it appears like the ROW_NUMBER is evaluated before the filters are applied. If I filter my view where PATIENT_ID = 1 and ANSWER_DATE = 1/5/2018 then rnkDesc still shows 2, instead of 1 like it would if I was using ROW_NUMBER in an inline query.
I am really interested in why this behaves this way. I can code around it by using an inline query.
I know that these ranking functions are nondeterministic, and would have thought the engine would filter the result set in the view before it generates the ROW_NUMBER. I tried this with RANK and DENSE_RANK as well, at it appears that these also behave the same way. (determined before the filters are applied.)
If you implemented this with CTEs or a subquery, you'd see the same results. This is necessary because sometimes you need to generate a rank on a result, and then have that rank be unchanged by outer queries. So it is as if the rank is generated first as part of the subquery, and then it is "locked in" so you can filter the results based on that row number.
Let's imagine if one of your filters in the outer query was actually rnkDesc = 2, which is a way that sometimes you can do things like get "2nd most". Imagine if the row number was not generated until after the outer query filter was generated, this would make this type of approach impossible. How do you filter the results on the value of something that hasn't been determined yet? This is the same reason that filtering on a window function usually requires first nesting a subquery or a CTE, so you can filter on the generated results, and those results don't get renumbered/ranked in the outer dynamically.
Therefore it makes sense to lock-in windowed function results based on the nesting they occur in. You have to kind of think about this kind of nesting in terms of subresults.
So that answers your question of "Why" it is this way. I understand what you're trying to achieve though and why you want to do this. You're trying to simplify the use of the window function and have it apply dynamically to the final result. I'm honestly not sure "how" you'd do this without just including the window function in the outer query. There may be a way to embed it in a UDF, but I'm not sure.
It's important to understand that there is a clear, indeed deterministic, order of evaluation in SQL.
The ROW_NUMBER on the view has already been evaluated before the left-join occurs. The ON clause does not act as a "filter" on the table to which the join refers, but as a condition that must be met for a join between the tables to occur.
You could of course create a "parameterised view" (a table-valued inline function), which allows you to pass in the filter date to a where-clause before the ROW_NUMBER is applied in the select-clause, and then outer-apply onto it. That may be appropriate if a large number of queries use the same fuctionality.
But otherwise I'd be inclined to leave the "substance use history" view unadulterated with any row-numbering (unless it is used independently by other queries to get the absolute latest answer), and write the "the latest row on or before the current appointment" logic inline.

Use Access SQL to do a grouped ranking

How do I rank salespeople by # customers grouped by department (with ties included)?
For example, given this table, I want to create the Rank column on the right. How should I do this in Access?
SalesPerson Dept #Customers Rank
Bill DeptA 20 1
Ted DeptA 30 2
Jane DeptA 40 3
Bill DeptB 50 1
Mary DeptB 60 2
I already know how to do a simple ranking with this SQL code. But I don't know how to rework this to accept grouping.
Select Count(*) from [Tbl] Where [#Customers] < [Tblx]![#Customers] )+1
Also, there's plenty of answers for this using SQL Server's Rank() function, but I need to do this in Access. Suggestions, please?
SELECT *, (select count(*) from tbl as tbl2 where
tbl.customers > tbl2.customers and tbl.dept = tbl2.dept) + 1 as rank from tbl
Just add the dept field to the subquery...
Great solution with subquery! Except for huge recordsets, the subquery solution gets very slow. Its better(quicker) to use a Self JOIN, look at the folowing solution: self join
SELECT tbl1.SalesPerson , count(*) AS Rank
FROM tbl AS tbl1 INNER JOIN tbl AS tbl2 ON tbl1.DEPT = tbl2.DEPT
AND tbl1.#Customers < tbl2.#Customers
GROUP BY tbl1.SalesPerson
I know this is an old thread. But since I spent a great deal of time on a very similar problem and was greatly helped by the former answers given here, I would like to share what I have found to be a MUCH faster way. (Beware, it is more complicated.)
First make another table called "Individualizer". This will have one field containing a list of numbers 1 through the-highest-rank-that-you-need.
Next create a VBA module and paste this into it:
'Global Declarations Section.
Option Explicit
Global Cntr
'*************************************************************
' Function: Qcntr()
'
' Purpose: This function will increment and return a dynamic
' counter. This function should be called from a query.
'*************************************************************
Function QCntr(x) As Long
Cntr = Cntr + 1
QCntr = Cntr
End Function
'**************************************************************
' Function: SetToZero()
'
' Purpose: This function will reset the global Cntr to 0. This
' function should be called each time before running a query
' containing the Qcntr() function.
'**************************************************************
Function SetToZero()
Cntr = 0
End Function
Save it as Module1.
Next, create Query1 like this:
SELECT Table1.Dept, Count(Table1.Salesperson) AS CountOfSalesperson
FROM Table1
GROUP BY Table1.Dept;
Create a MakeTable query called Query2 like this:
SELECT SetToZero() AS Expr1, QCntr([ID]) AS Rank, Query1.Dept,
Query1.CountOfSalesperson, Individualizer.ID
INTO Qtable1
FROM Query1
INNER JOIN Individualizer
ON Query1.CountOfSalesperson >= Individualizer.ID;
Create another MakeTable query called Query3 like this:
SELECT SetToZero() AS Expr1, QCntr([Identifier]) AS Rank,
[Salesperson] & [Dept] & [#Customers] AS Identifier, Table1.Salesperson,
Table1.Dept, Table1.[#Customers]
INTO Qtable2
FROM Table1;
If you have another field already that uniquely identifies every row you wouldn't need to create an Identifier field.
Run Query2 and Query3 to create the tables.
Create a fourth query called Query4 like this:
SELECT Qtable2.Salesperson, Qtable2.Dept, Qtable2.[#Customers], Qtable1.ID AS Rank
FROM Qtable1
INNER JOIN Qtable2 ON Qtable1.Rank = Qtable2.Rank;
Query4 returns the result you are looking for.
Practically, you would want to write a VBA function to run Query2 and Query3 and then call that function from a button placed in a convenient location.
Now I know this sounds ridiculously complicated for the example you gave. But in real life, I am sure your table is more complicated than this. Hopefully my examples can be applied to your actual situation. In my database with over 12,000 records this method is by FAR the fastest (as in: 6 seconds with 12,000 records compared to over 1 minute with 262 records ranked with the subquery method).
The real secret for me was the MakeTable query because this ranking method is useless unless you immediately output the results to a table. But, this does limit the situations that it can be applied to.
P.S. I forgot to mention that in my database I was not pulling results directly from a table. The records had already gone through a string of queries and multiple calculations before they needed to be ranked. This probably contributed greatly to the huge difference in speed between the two methods in my situation. If you are pulling records directly from a table, you might not notice nearly as big an improvement.
You need to do some math. I typically take advantage of the combination of a counter field and an "offset" field. You're aiming for a table which looks like this (#Customers isn't necessary, but will give you a visual that you're doing it properly):
SalesPerson Dept #Customers Ctr Offset
Bill DeptA 20 1 1
Ted DeptA 30 2 1
Jane DeptA 40 3 1
Bill DeptB 50 4 4
Mary DeptB 60 5 4
So, to give rank, you'd do [Ctr]-[Offset]+1 AS Rank
build a table with SalesPerson, Dept, Ctr, and Offset
insert into that table, ordered by Dept and #Customers (so that they're all sorted properly)
Update Offset to be the MIN(Ctr), grouping on Dept
Perform your math calculation to determine Rank
Clear out the table so you're ready to use it again next time.
To add to this and any other related Access Ranking or Rank Tie Breaker how-tos for other versions of Access, ranking should not be performed on crosstab queries if your FROM clause happens to NOT contain a table but a query that is either a crosstab query or a query that contains within it elsewhere a crosstab query.
The code referenced above where a SELECT statement within a SELECT statment is used (sub query),
"SELECT *, (select count(*) from tbl as tbl2 where tbl.customers > tbl2.customers and tbl.dept = tbl2.dept) + 1 as rank from tbl"
will not work and will always fail expressing a error on portion of the code where "tbl.customers > tbl2.customers" cannot be found.
In my situation on a past project, I was referencing a query instead of a table and within that query I had referenced a crosstab query thus failing and producing an error. I was able to resolve this by creating a table from the crosstab query first, and when I referenced the newly created table in the FROM clause, it started working for me.
So in final, normally you can reference a query or table in the FROM clause of the SELECT statement as what was shared previously above to do ranking, but be carefull as to if you are referencing a query instead of a table, that query must Not be a crosstab query or reference another query that is a crosstab query.
Hope this helps anyone else that may have had problems looking for a possible reason if you happen to reference the statements above and you are not referencing a table in your FROM clause within your own project. Also, performing subqueries on aliases with crosstab queries in Access probably isn't good idea or best practice either so stray away from that if/when possible.
If you found this useful, and wish that Access would allow the use of a scrolling mouse in a passthru query editor, give me a like please.
I normally pick tips and ideas from here and sometimes end up building amazing things from it!
Today, (well let’s say for the past one week), I have been tinkering with Ranking of data in Access and to the best of my ability, I did not anticipate what I was going to do something so complex as to take me a week to figure it out! I picked titbits from two main sites:
https://usefulgyaan.wordpress.com/2013/04/23/ranking-in-ms-access/ (seen that clever ‘>=’ part, and the self joins? Amazing… it helped me to build my solution from just one query, as opposed to the complex method suggested above by asonoftheMighty (not discrediting you… just didn’t want to try it for now; may be when I get to large data I might want to try that as well…)
Right here, from Paul Abott above ( ‘and tbl.dept = tbl2.dept’)… I was lost after ranking because I was placing AND YearID = 1, etc, then the ranking would end up happening only for sub-sets, you guessed right, when YearID = 1! But I had a lot of different scenarios…
Well, I gave that story partly to thank the contributors mentioned, because what I did is to me one of the most complex of the ranking that I think can help you in almost any situation, and since I benefited from others, I would like to share here what I hope may benefit others as well.
Forgive me that I am not able to post my table structures here, it is a lot of related tables. I will only post the query, so if you need to you may develop your tables to end up with that kind of query. But here is my scenario:
You have students in a school. They go through class 1 to 4, can either be in stream A or B, or none when the class is too small. They each take 4 exams (this part is not important now), so you get the total score for my case. That’s it. Huh??
Ok. Lets rank them this way:
We want to know the ranking of
• all students who ever passed through this school (best ever student)
• all students in a particular academic year (student of the year)
• students of a particular class (but remember a student will have passed through all classes, so basically his/her rank in each of those classes for the different years) this is the usual ranking that appears in report cards
• students in their streams (above comment applies)
• I would also like to know the population against which we ranked this student in each category
… all in one table/query. Now you get the point?
(I normally like to do as much of my 'programming' in the database/queries to give me visuals and to reduce the amount of code I will later have to right. I actually won't use this query in my application :), but it let's me know where and how to send my parameters to the query it came from, and what results to expect in my rdlc)
Don't you worry, here it is:
SELECT Sc.StudentID, Sc.StudentName, Sc.Mark,
(SELECT COUNT(Sch.Mark) FROM [StudentScoreRankTermQ] AS Sch WHERE (Sch.Mark >= Sc.Mark)) AS SchoolRank,
(SELECT Count(s.StudentID) FROM StudentScoreRankTermQ AS s) As SchoolTotal,
(SELECT COUNT(Yr.Mark) FROM [StudentScoreRankTermQ] AS Yr WHERE (Yr.Mark >= Sc.Mark) AND (Yr.YearID = Sc.YearID) ) AS YearRank,
(SELECT COUNT(StudentID) FROM StudentScoreRankTermQ AS Yt WHERE (Yt.YearID = Sc.YearID) ) AS YearTotal,
(SELECT COUNT(Cl.Mark) FROM [StudentScoreRankTermQ] AS Cl WHERE (Cl.Mark >= Sc.Mark) AND (Cl.YearID = Sc.YearID) AND (Cl.TermID = Sc.TermID) AND (Cl.ClassID=Sc.ClassID)) AS ClassRank,
(SELECT COUNT(StudentID) FROM StudentScoreRankTermQ AS C WHERE (C.YearID = Sc.YearID) AND (C.TermID = Sc.TermID) AND (C.ClassID = Sc.ClassID) ) AS ClassTotal,
(SELECT COUNT(Str.Mark) FROM [StudentScoreRankTermQ] AS Str WHERE (Str.Mark >= Sc.Mark) AND (Str.YearID = Sc.YearID) AND (Str.TermID = Sc.TermID) AND (Str.ClassID=Sc.ClassID) AND (Str.StreamID = Sc.StreamID) ) AS StreamRank,
(SELECT COUNT(StudentID) FROM StudentScoreRankTermQ AS St WHERE (St.YearID = Sc.YearID) AND (St.TermID = Sc.TermID) AND (St.ClassID = Sc.ClassID) AND (St.StreamID = Sc.StreamID) ) AS StreamTotal,
Sc.CalendarYear, Sc.Term, Sc.ClassNo, Sc.Stream, Sc.StreamID, Sc.YearID, Sc.TermID, Sc.ClassID
FROM StudentScoreRankTermQ AS Sc
ORDER BY Sc.Mark DESC;
You should get something like this:
+-----------+-------------+------+------------+-------------+----------+-----------+-----------+------------+------------+-------------+------+------+-------+--------+
| StudentID | StudentName | Mark | SchoolRank | SchoolTotal | YearRank | YearTotal | ClassRank | ClassTotal | StreamRank | StreamTotal | Year | Term | Class | Stream |
+-----------+-------------+------+------------+-------------+----------+-----------+-----------+------------+------------+-------------+------+------+-------+--------+
| 1 | Jane | 200 | 1 | 20 | 2 | 12 | 1 | 9 | 1 | 5 | 2017 | I | 2 | A |
| 2 | Tom | 199 | 2 | 20 | 1 | 12 | 3 | 9 | 1 | 4 | 2016 | I | 1 | B |
+-----------+-------------+------+------------+-------------+----------+-----------+-----------+------------+------------+-------------+------+------+-------+--------+
Use the separators | to reconstruct the result table
Just an idea about the tables, each student will be related to a class. Each class relates to years. Each stream relates to a class. Each term relates to a year. Each exam relates to a term and student and a class and a year; a student can be in class 1A in 2016 and moves on to class 2b in 2017, etc…
Let me also add that this a beta result, I have not tested it well enough and I do not yet have an opportunity to create a lot of data to see the performance. My first glance at it told me that it is good. So if you find reasons or alerts you want to point my way, please do so in comments so I may keep learning!

SQL Query with multiple values in one column

I've been beating my head on the desk trying to figure this one out. I have a table that stores job information, and reasons for a job not being completed. The reasons are numeric,01,02,03,etc. You can have two reasons for a pending job. If you select two reasons, they are stored in the same column, separated by a comma. This is an example from the JOBID table:
Job_Number User_Assigned PendingInfo
1 user1 01,02
There is another table named Pending, that stores what those values actually represent. 01=Not enough info, 02=Not enough time, 03=Waiting Review. Example:
Pending_Num PendingWord
01 Not Enough Info
02 Not Enough Time
What I'm trying to do is query the database to give me all the job numbers, users, pendinginfo, and pending reason. I can break out the first value, but can't figure out how to do the second. What my limited skills have so far:
select Job_number,user_assigned,SUBSTRING(pendinginfo,0,3),pendingword
from jobid,pending
where
SUBSTRING(pendinginfo,0,3)=pending.pending_num and
pendinginfo!='00,00' and
pendinginfo!='NULL'
What I would like to see for this example would be:
Job_Number User_Assigned PendingInfo PendingWord PendingInfo PendingWord
1 User1 01 Not Enough Info 02 Not Enough Time
Thanks in advance
You really shouldn't store multiple items in one column if your SQL is ever going to want to process them individually. The "SQL gymnastics" you have to perform in those cases are both ugly hacks and performance degraders.
The ideal solution is to split the individual items into separate columns and, for 3NF, move those columns to a separate table as rows if you really want to do it properly (but baby steps are probably okay if you're sure there will never be more than two reasons in the short-medium term).
Then your queries will be both simpler and faster.
However, if that's not an option, you can use the afore-mentioned SQL gymnastics to do something like:
where find ( ',' |fld| ',', ',02,' ) > 0
assuming your SQL dialect has a string search function (find in this case, but I think charindex for SQLServer).
This will ensure all sub-columns begin and start with a comma (comma plus field plus comma) and look for a specific desired value (with the commas on either side to ensure it's a full sub-column match).
If you can't control what the application puts in that column, I would opt for the DBA solution - DBA solutions are defined as those a DBA has to do to work around the inadequacies of their users :-).
Create two new columns in that table and make an insert/update trigger which will populate them with the two reasons that a user puts into the original column.
Then query those two new columns for specific values rather than trying to split apart the old column.
This means that the cost of splitting is only on row insert/update, not on _every single select`, amortising that cost efficiently.
Still, my answer is to re-do the schema. That will be the best way in the long term in terms of speed, readable queries and maintainability.
I hope you are just maintaining the code and it's not a brand new implementation.
Please consider to use a different approach using a support table like this:
JOBS TABLE
jobID | userID
--------------
1 | user13
2 | user32
3 | user44
--------------
PENDING TABLE
pendingID | pendingText
---------------------------
01 | Not Enough Info
02 | Not Enough Time
---------------------------
JOB_PENDING TABLE
jobID | pendingID
-----------------
1 | 01
1 | 02
2 | 01
3 | 03
3 | 01
-----------------
You can easily query this tables using JOIN or subqueries.
If you need retro-compatibility on your software you can add a view to reach this goal.
I have a tables like:
Events
---------
eventId int
eventTypeIds nvarchar(50)
...
EventTypes
--------------
eventTypeId
Description
...
Each Event can have multiple eventtypes specified.
All I do is write 2 procedures in my site code, not SQL code
One procedure converts the table field (eventTypeIds) value like "3,4,15,6" into a ViewState array, so I can use it any where in code.
This procedure does the opposite it collects any options your checked and converts it in
If changing the schema is an option (which it probably should be) shouldn't you implement a many-to-many relationship here so that you have a bridging table between the two items? That way, you would store the number and its wording in one table, jobs in another, and "failure reasons for jobs" in the bridging table...
Have a look at a similar question I answered here
;WITH Numbers AS
(
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 0)) AS N
FROM JobId
),
Split AS
(
SELECT JOB_NUMBER, USER_ASSIGNED, SUBSTRING(PENDING_INFO, Numbers.N, CHARINDEX(',', PENDING_INFO + ',', Numbers.N) - Numbers.N) AS PENDING_NUM
FROM JobId
JOIN Numbers ON Numbers.N <= DATALENGTH(PENDING_INFO) + 1
AND SUBSTRING(',' + PENDING_INFO, Numbers.N, 1) = ','
)
SELECT *
FROM Split JOIN Pending ON Split.PENDING_NUM = Pending.PENDING_NUM
The basic idea is that you have to multiply each row as many times as there are PENDING_NUMs. Then, extract the appropriate part of the string
While I agree with DBA perspective not to store multiple values in a single field it is doable, as bellow, practical for application logic and some performance issues. Let say you have 10000 user groups, each having average 1000 members. You may want to have a table user_groups with columns such as groupID and membersID. Your membersID column could be populated like this:
(',10,2001,20003,333,4520,') each number being a memberID, all separated with a comma. Add also a comma at the start and end of the data. Then your select would use like '%,someID,%'.
If you can not change your data ('01,02,03') or similar, let say you want rows containing 01 you still can use " select ... LIKE '01,%' OR '%,01' OR '%,01,%' " which will insure it match if at start, end or inside, while avoiding similar number (ie:101).