multiple results in one result, application limitations

multiple results in one result, application limitations - sql

I am working with a legal software called Case Aware. You can do limited sql searches and I have had luck in getting Case Aware to pull a specific value from the database. My problem is that I need to create a sql search that returns multiple values but the Case Aware software will only accept one result as an answer. If my query produces a list, it will only recognize the top value. This is a limitation of the software I cannot get around.
My very basic search is:
select rate
From case_fin_info
where fin_info_id = 7 and rate!=0
This should produce a list of 3-15 rates, which does when the search is run straight from the database. However, when run through Case Aware, only the first rate in the table will pull. I need to pull the values through Case Aware because with Case Aware I can automatically insert the results into a template. (Where I work generates hundreds if not thousands a day so doing it manually is a B$##%!)
I need to find a way to pull all the values from the search into one value. I cannot use XML (Case Aware will give an error) and I cannot create a temporary table. (Again, a Case Aware limitation) If possible, I also need to insert a manual return between each value so they are separated in the document I am pulling this information into.
Case Aware does not have any user manual and you pay for support (We do have it) but I have my doubts on their abilities. I have been able to easily create queries that they have told me in the past are impossible. I am hoping this is one of those times.
IntegrationGirly
Addtl FYI:
I currently have this kludge: Pulling each value individually from the database even if it is null and putting each value into a table in the document. (30 separate searches) It "works" but takes much longer for the document to generate and it also leaves a great deal of empty space. Some case have 3 values, most have 5-10 but we have up to 30 areas for rate because once in a blue moon we need them. This makes the template look horribly junky but that doesn't affect the lawyers who generate the docs since they don't see it, but every time they generate the table they have to take out all the empty columns. With the number of docs we do each day, 1) this becomes time consuming and 2) This assumes attorneys and paralegals know how to take rows out of tables in word.

First, my condolences for having to work with such terrible software.
Second, here's a possible solution (this is assuming SQL Server):
1) Execute a SELECT COUNT(*) FROM case_fin_info WHERE fin_info_id = 7 AND rate <> 0. Store the result (number of rows) in your client application.
2) In your client app, do a for (i = 0; i < count; i++) loop. During each iteration, perform the query
WITH OrderedRates AS
(
SELECT Rate, ROW_NUMBER() OVER (ORDER BY <table primary key> ASC) AS 'RowNum'
FROM case_fin_info WHERE fin_info_id = 7 AND rate <> 0
)
SELECT Rate FROM OrderedRates WHERE RowNum = <count>
Replacing the stuff in <> as appropriate. Essentially you get the row count in your client app, then get one row at a time. It's inefficient as hell, but if you only have 15 rows it shouldn't be too bad.

I had a similar query to implement in my application. This should work.
DECLARE #Rate VARCHAR(8000)
SELECT #Rate = COALESCE(#Rate + ', ', '') + rate
From case_fin_info where fin_info_id = 7 and rate!=0;

Here's a single query that will return the one result in a single column. It assumes your manual return is CR + LF. And, you would need to expand it to handle all 15 rates.
SELECT max(Rate1) + CHAR(13) + CHAR(10)
+ max(Rate2) + CHAR(13) + CHAR(10)
+ max(Rate3) + CHAR(13) + CHAR(10)
+ max(Rate4) + CHAR(13) + CHAR(10)
+ max(Rate5) + CHAR(13) + CHAR(10)
FROM (
SELECT CASE RateID WHEN 1 THEN CAST(rate as varchar) END AS Rate1,
CASE RateID WHEN 2 THEN CAST(rate as varchar) END AS Rate2,
CASE RateID WHEN 3 THEN CAST(rate as varchar) END AS Rate3,
CASE RateID WHEN 4 THEN CAST(rate as varchar) END AS Rate4,
CASE RateID WHEN 5 THEN CAST(rate as varchar) END AS Rate5
FROM
(
select RateID, rate From case_fin_info where fin_info_id = 7 and rate!=0
) as r
) as Rates

Related

Group a sub-set of a result set by time interval

I have an audit table where specific actions are being recorded (such as 'access', 'create', 'update' and so on). I am selecting these records so that they can be displayed in a table to the administrative user.
This works fine when I select all the records for a particular entity. However, because I am using the Post-Redirect-Get pattern, the 'access' records are being logged on every page view. In a typical session an end user may hit the same page 6 or 7 times in the same 5 minute window. As a consequence, the administrative user is having to scroll through quite a few redundant access records and this is understandably distracting from the user experience.
To solve this problem, I have written two queries. The first will look for all records that are not access records. The second will look for access records and then groups them into ten minute intervals. I then UNION these two queries and order by the datetime.
-- Select non 'access' records
SELECT
[ORIGIN_ID]
,[ORIGIN_ID_TYPE]
,[REFERENCE_ID]
,[REFERENCE_ID_TYPE]
,[ACTION_TYPE_ID]
,CAST([ORIGINAL_VALUE] AS VARCHAR(8000)) AS ORIGINAL_VALUE
,CAST([CHANGED_VALUE] AS VARCHAR(8000)) AS CHANGED_VALUE
,[CREATED_BY]
,[CREATED_ON]
FROM [HISTORY]
WHERE [ORIGIN_ID] = 500 AND [ORIGIN_ID_TYPE] = 4 AND [ACTION_TYPE_ID] != 1
UNION
-- Select 'access' records and group them into 10 minute intervals by ts
SELECT
[ORIGIN_ID]
,[ORIGIN_ID_TYPE]
,[REFERENCE_ID]
,[REFERENCE_ID_TYPE]
,[ACTION_TYPE_ID]
,CAST([ORIGINAL_VALUE] AS VARCHAR(255)) AS ORIGINAL_VALUE
,CAST([CHANGED_VALUE] AS VARCHAR(255)) AS CHANGED_VALUE
,[CREATED_BY]
,DATEADD(MINUTE, DATEDIFF(MINUTE, 0, [CREATED_ON]) / 10 * 10, 0) AS CREATED_ON
FROM [HISTORY]
WHERE [ACTION_TYPE_ID] = 1 AND [ORIGIN_ID] = 500 AND [ORIGIN_ID_TYPE] = 4
GROUP BY
[ORIGIN_ID]
,[ORIGIN_ID_TYPE]
,[REFERENCE_ID]
,[REFERENCE_ID_TYPE]
,[ACTION_TYPE_ID]
,CAST([ORIGINAL_VALUE] AS VARCHAR(255))
,CAST([CHANGED_VALUE] AS VARCHAR(255))
,[CREATED_BY]
,DATEADD(MINUTE, DATEDIFF(MINUTE, 0, [CREATED_ON]) / 10 * 10, 0)
ORDER BY [CREATED_ON] DESC
SQLFiddle (I had a limited amount of data SQLFiddle would allow me to upload)
I feel like there may be a better way to do this that does not require me to use UNION. In order to do it this way I had to cast my TEXT columns to VARCHAR columns and I feel like there could be a better alternative. Any suggestions as to how this query can be improved?

Eliminate the union by using these two groupings. The second one also becomes the new expression for the combined created_on column. The first can also be used to control sorting and then otherwise discarded. (Don't forget to remove the filter on action_type_id too.):
case when action_type_id <> 1 then 1 else 2 end,
case when action_type_id <> 1
then created_on
else DATEADD(MINUTE, DATEDIFF(MINUTE, 0, [CREATED_ON]) / 10 * 10, 0)
end
This will cause the query to treat the two types of actions as distinct for purposes of aggregation. Since you do want to keep every row with a non-1 action, you don't collapse those into ten-minute blocks at all.
Note that this wouldn't quite work as is if it's possible to have two such rows recorded with the same timestamp. You'd need to group on another ID (or just row_number()) to get around that but I suspect that'll be unnecessary.

CTE query in SQL Server : exit when one row exists in result

I'm writing a SQL Server procedure to optimize cut of bars. I haven't found yet the best method. Seems to be CTE request, but I'm stuck.
I try to write a stored procedure to optimize cut of bars. For my test, I have to cut 18 pieces (3 of 1000 mm, 3 of 1500 mm, 3 of 2500mm, 3 of 3500 mm, 3 of 4500 mm and 3 of 6000 mm), and I have 3 sizes of bars (5500mm, 7000mm and 8500mm).
After that, I generate every combination of bars with any cuts as possible.
I tried with a while loop and a temporary table, It takes me one hour and a half. But I think I can do better with a CTE request...
Now, I must generate every combination of many bars to have my 18 cuts. I made another CTE request, but I haven't find the way to stop recursivity when at least one combination has all the cuts. So, my request find over 150 millions combinations, with 8,9,10,11... bars. And it tries every loop with 18 bars. I want it to stop with 8 bars (I know it is the smallest bar count I need for my cuts). And it takes more than two days !
I have 2 temporary tables, on with my combination of bars (#COMBI_BARRE) with this structure : ID_ART : identity for article, COLOR, CUT_COMBI : a varchar concat the cut ID of the bar combination : 1-2-3-4..., NB_CUTSan integer to get the count of cuts in the bar, FIRST_CUT the smaller cut ID of the bar.
I have another temporary table #DET_BAR with the detail of my cuts, with 2 columns : ID_COMBI_BAR the bar combination ID and ID_CUT_STR, the cut ID in varchar (to avoid cast or convert in CTE for better performance).
I store the result in a table call Combi, with the ID_ART, the COLOR, a varchar column Combi who concat the the bar combination ID (1-2-3-4...), a varchar column COMBI_CUT who concat the ID_CUT (1-2-3-4-5...), NB_BAR the count of bar in the combination, NB_CUTS : the count of cuts in the combination, MAX_CUTS the total number of cut I must to for my article and color.
As it makes one loop per bar,I tried to add a exists clause to stop recursivity when the number of loop has at least one combination with all my cuts. I know I must not cut 10 bars if I can do it with 8. But I get an error "the recursive table has multiple reference'.
How can I make my request and avoid every loop ?
;WITH Combi (ID_ART, COLOR, COMBI, COMBI_CUT, NB_BAR, NB_CUTS, MAX_CUTS)
AS
( SELECT C.ID_ART,
C.COLOR,
'-' + ID_COMBI_BAR_STR + '-',
'-' + C.CUT_COMBI + '-',
1,
C.NB_CUTS,
ISNULL(MAXI.CUT_NUM,0)
FROM #COMBI_BARRE C with(nolock)
outer apply (select top 1 D.CUT_NUM
from #DEBITS D
where D.ID_ART = C.ID_ART
and D.COLOR= C.COLOR
order by D.NUM_OCC_DEB desc) MAXI
WHERE C.FIRST_CUT = 1
UNION ALL
SELECT C.ID_ART,
C.COLOR,
Combi.COMBI + ID_COMBI_BAR_STR + '-',
Combi.COMBI_CUT+ C.CUT_COMBI + '-',
Combi.NB_BAR+ 1,
Combi.NB_CUTS+ C.NB_CUTS,
Combi.MAX_CUTS
FROM #COMBI_BARRE C with(nolock)
INNER JOIN Combi on C.ID_ART = Combi.ID_ART
and C.COLOR= Combi.COLOR
where C.FIRST_CUT > Combi.NB_BAR
and Combi.NB_CUTS+ C.NB_CUTS<= Combi.MAX_CUTS
and NOT EXISTS(select * from #DET_BAR D with(nolock)
where D.ID_COMBI_BAR = C.ID_COMBI_BAR
and PATINDEX(D.ID_CUT_STR, Combi.COMBI_CUT) > 0)
and NOT EXISTS(select top 1 * from Combi Combi2 where Combi2.ID_ART = C.ID_ART and Combi2.COLOR = C.COLOR and Combi2.NB_CUTS = Combi2.MAX_CUTS)
)
select * from Combi

This is a variation of the bin packing problem. That search term might help you in the right direction.
Also, you can to go my Bin Packing page, which gives several approaches to the more simplified version of your problem.
A small warning: the linked article(s) don't use any (recursive) CTE, so they won't answer your specific CTE question.

SQL WHERE filter on two integers, is it faster by conversion to char or a combined integer?

I have a table with a primary key made of two 32bit integers. I want to filter by an explicit list of these, and want to know the fastest approach. There are 3 ways I can think of.My question simply is: Which method is the fastest out of the second method or the third method?
1st method I do not want to use because if I have many to list (only filtering for 2 rows in this example), it gets messy, or need a temp table, so not as concise:
select *
from [table]
where
(
([int1] = 123 and [int2] = 456)
OR ([int1] = 654 and [int2] = 321)
--etc
)
2nd method convert to varchar
select *
from [table]
where convert(varchar(10), [int1]) + ',' + convert(varchar(10), [int2]) IN ('123,456','654,321')
3rd method combine two 32bit integers to single 64bit integer
select *
from [table]
where convert(bigint, [int1]) * 4294967296 + [int2] IN (528280977864,2808908611905)
Edit
Thanks to suggestion from Aron, I have tried using statistics - these are the results on a table with > 1 million rows, average from 10 trials each:
Time Statistics method 1 method 2 method 3
Client processing time 22.1 2.7 2.9
Total execution time 300.5 1099.8 1317.3
Wait time on server replies 278.4 1097.1 1314.4
So really querying on them as is is the fastest by far, but if I did pick between the second or third method, then varchar is faster (which surprises me).

Your first method:
select *
from [table]
where ([int1] = 123 and [int2] = 456) OR
[int1] = 654 and [int2] = 321) OR
--etc
)
Should be the fastest because it can take advantage of an index on (int1, int2). Perhaps the fastest method for a large list is to store the pairs in a temporary table with an index (clustered or unclustered) on int1 and int2.
I would shy away from playing around with the values. The bulk of the effort of the query is reading the data pages. Slight variations in comparison logic will have little impact on the query.

Maybe you need to give a better example?
I tried your example and performance looks all good. a bigger number of the result set can predict better? try using estimated plan.
create table #table (int1 int,int2 int)
insert into #table values(123,456);
insert into #table values(654,321);
select *
from #table
where
(
([int1] = 123 and [int2] = 456)
OR ([int1] = 654 and [int2] = 321)
)
select *
from #table
where convert(varchar(10), [int1]) +'-'+ convert(varchar(10), [int2]) IN ('123-456','654-321')
select *
from #table
where convert(bigint, [int1]) * 4294967296 + [int2] IN (528280977864,2808908611905)
--drop table #table
will give almost same estimated cost. 33% each query...

The 1st method that you dont want to use because of its messy, seems to be the fastest way, just put them on two columns and index them.
Speed of queries in SQL doesn't depend on the number of fields queried or complexity of queries, it only depends on how you use its index.

Retrieve a random row with like statement (over 5 millions rows)

I have a DB with two tables
tblVideos is about 8 million rows, contains Id(auto increment 1,1), videoId, Name, Tags, (FK)VideoProviderId
tblVideoProviders is about 6 providers at the moment, and has 3 columns:
Id(auto increment 1,1 tiny int), Name, Url(to build the link using the provider + video Id)
Unlike YouTube smaller providers don't have an API to return an array then pick up something random.
retrieving a totally random row takes under a second in both ways I got now:
select top 1 tblVideoProvider.Url + tblVideos.videoId as url, tblVideos.Name,
tblVideos.tags from tblVideos
inner join tblVideoProvider
on tblVideos.VideoProviderId = tblVideoProvider.id
WHERE ((ABS(CAST(
(BINARY_CHECKSUM
(tblVideos.id, NEWID())) as int))
% 6800000) < 10 )
OR
slightly longer
select top 1 tblVideoProvider.Url + tblVideos.videoId as url,
tblVideos.Name, tblVideos.tags from tblVideos
inner join tblVideoProvider
on tblVideos.VideoProviderId = tblVideoProvider.id
ORDER BY NEWID()
but once I start looking for something more specific:
select top 1 tblVideoProvider.Url + tblVideos.videoId as url, tblVideos.Name,
tblVideos.tags from tblVideos
inner join tblVideoProvider
on tblVideos.VideoProviderId = tblVideoProvider.id
where (tblVideos.tags like '%' + #tag + '%')
or (tblVideos.Name like '%' + #tag + '%')
ORDER BY NEWID()
The query hits 8 seconds, removing the last or tblVideos like takes it down to 4~5 seconds, but that's way too high.
retrieving the whole query without the "order by newid()" will make the query take a lot less time but the application will consume about 0.2~2 MB of data per user, and assuming over 200~400 simultanios requests ends up in lots of data

In general the "like" operator is very expensive, and when the pattern starts with a "%" even an index on the respective column (assuming you have one) cannot be used. I think there is no easy way to increase the performance of your query.

How to sort and display mixed lists of alphas and numbers as the users expect?

Our application has a CustomerNumber field. We have hundreds of different people using the system (each has their own login and their own list of CustomerNumbers). An individual user might have at most 100,000 customers. Many have less than 100.
Some people only put actual numbers into their customer number fields, while others use a mixture of things. The system allows 20 characters which can be A-Z, 0-9 or a dash, and stores these in a VARCHAR2(20). Anything lowercase is made uppercase before being stored.
Now, let's say we have a simple report that lists all the customers for a particular user, sorted by Customer Number. e.g.
SELECT CustomerNumber,CustomerName
FROM Customer
WHERE User = ?
ORDER BY CustomerNumber;
This is a naive solution as the people that only ever use numbers do not want to see a plain alphabetic sort (where "10" comes before "9").
I do not wish to ask the user any unnecessary questions about their data.
I'm using Oracle, but I think it would be interesting to see some solutions for other databases. Please include which database your answer works on.
What do you think the best way to implement this is?

Probably your best bet is to pre-calculate a separate column and use that for ordering and use the customer number for display. This would probably involve 0-padding any internal integers to a fixed length.
The other possibility is to do your sorting post-select on the returned results.
Jeff Atwood has put together a blog posting about how some people calculate human friendly sort orders.

In Oracle 10g:
SELECT cust_name
FROM t_customer c
ORDER BY
REGEXP_REPLACE(cust_name, '[0-9]', ''), TO_NUMBER(REGEXP_SUBSTR(cust_name, '[0-9]+'))
This will sort by the first occurence of number, not regarding it's position, i. e.:
customer1 < customer2 < customer10
cust1omer ? customer1
cust8omer1 ? cust8omer2
, where a ? means that the order is undefined.
That suffices for most cases.
To force sort order on case 2, you may add a REGEXP_INSTR(cust_name, '[0-9]', n) to ORDER BY list n times, forcing order on the first appearance of n-th (2nd, 3rd etc.) group of digits.
To force sort order on case 3, you may add a TO_NUMBER(REGEXP_SUBSTR(cust_name, '[0-9]+', n)) to ORDER BY list n times, forcing order of n-th. group of digits.
In practice, the query I wrote is enough.
You may create a function based index on these expressions, but you'll need to force it with a hint, and a one-pass SORT ORDER BY will be performed anyway, as the CBO doesn't trust function-base indexes enough to allow an ORDER BY on them.

You could have a numeric column [CustomerNumberInt] that is only used when the CustomerNumber is purely numeric (NULL otherwise[1]), then
ORDER BY CustomerNumberInt, CustomerNumber
[1] depending on how your SQL version handles NULLs in ORDER BY you might want to default it to zero (or infinity!)

I have a similar horrible situation and have developed a suitably horrible function to deal with it (SQLServer)
In my situation I have a table of "units" (this is a work-tracking system for students, so unit in this context represents a course they're doing). Units have a code, which for the most part is purely numeric, but for various reasons it was made a varchar and they decided to prefix some by up to 5 characters. So they expect 53,123,237,356 to sort normally, but also T53, T123, T237, T356
UnitCode is a nvarchar(30)
Here's the body of the function:
declare #sortkey nvarchar(30)
select #sortkey =
case
when #unitcode like '[^0-9][0-9]%' then left(#unitcode,1) + left('000000000000000000000000000000',30-(len(#unitcode))) + right(#unitcode,len(#unitcode)-1)
when #unitcode like '[^0-9][^0-9][0-9]%' then left(#unitcode,2) + left('000000000000000000000000000000',30-(len(#unitcode))) + right(#unitcode,len(#unitcode)-2)
when #unitcode like '[^0-9][^0-9][^0-9][0-9]%' then left(#unitcode,3) + left('000000000000000000000000000000',30-(len(#unitcode))) + right(#unitcode,len(#unitcode)-3)
when #unitcode like '[^0-9][^0-9][^0-9][^0-9][0-9]%' then left(#unitcode,4) + left('000000000000000000000000000000',30-(len(#unitcode))) + right(#unitcode,len(#unitcode)-4)
when #unitcode like '[^0-9][^0-9][^0-9][^0-9][^0-9][0-9]%' then left(#unitcode,5) + left('000000000000000000000000000000',30-(len(#unitcode))) + right(#unitcode,len(#unitcode)-5)
when #unitcode like '%[^0-9]%' then #unitcode
else left('000000000000000000000000000000',30-len(#unitcode)) + #unitcode
end
return #sortkey
I wanted to shoot myself in the face after writing that, however it works and seems not to kill the server when it runs.

I used this in SQL SERVER and working great: Here the solution is to pad the numeric values with a character in front so that all are of the same string length.
Here is an example using that approach:
select MyCol
from MyTable
order by
case IsNumeric(MyCol)
when 1 then Replicate('0', 100 - Len(MyCol)) + MyCol
else MyCol
end
The 100 should be replaced with the actual length of that column.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas