Problems implementing SQL query in Rapidminer to create report - sql

My Questions:
Is there a way to add quotes around %{macro_name} in my SQL query ?
Is there a better way to create the required report (i.e. not using RapidMiner)?
My Process:
I'm currently trying to create a custom report of data stored in a MySQL database. Here is an hypothetical example of my table data:
Item_Name Item_Price Item_Stock Item_Timestamp
Dish Soap 3.99 25 1/1/2013 12:00am
Frogs 0.69 26 1/1/2013 12:00am
Frogs 0.69 19 1/1/2013 1:00am
Dish Soap 3.99 28 1/1/2013 1:00am
Item_Timestamp refers to the datetime of when the entry was made.
I'm attempting to use RapidMiner to do the following:
Provide a summation of increases in Item_Stock for each unique Item_Name
Provide a summation of decreases in Item_Stock for each unique Item_Name
Provide the average rate of change over a specified time period
My goal is to create a report that tells me whether items are being restocked at a rate of equilibrium with demand.
In order to create a report for each unique Item_Name, I have created a RapidMiner process which loads unique Item_Name as an example set, then attempts to loop through the exampleset by using the extract macro operator which sends the Item_Name from each example to another SQL query. RapidMiner uses %{macro_name} as the syntax for the macro. My SQL query looks like:
Select Item_Name
From thisTable
Where Item_Name = %{macro_name}
The problem is that this query throws an exception, but I'm not sure why. Perhaps the problem is that %{macro_name} returns a string without the necessary quotes, but I am unsure.
My questions are:
Is there a way to add quotes around %{macro_name} in my SQL query ?
Is there a better way to create the required report (i.e. not using RapidMiner)?

I figured out the main problem:
My current SQL string has a syntax problems. I needed to capitalize and there should be no space before %{macro_name}
SELECT Item_Name
FROM `thisTable`
WHERE Item_Name =%{macro_name}

Related

SQL Converting multiple rows into multiple columns

I have a table that looks like this:
Sku_Code Channel Rank Category Website Date
123 US 28 Toys www.foo.com 2021-06-07
123 US 13 Games www.lolo.com 2021-06-07
328 CA 12 Toys www.lo.com 2021-05-12
123 US 2 Games www.foo.com 2021-06-05
I would like to pivot this table so all ID information falls in one row...like this:
Sku_Code Channel Category Category_1 Website Website_1 Date
123 US Toys Games www.foo.com www.lolo.com 2021-06-07
328 CA Toys www.lo.com 2021-05-12
123 US Games www.foo.com 2021-06-05
It's a fairly large table so wondering whats the best/fastest way to do this? I know there is a pivot function I could use but do not now how to apply it in this situation.
I'm fairly new to SQL so any help would be appreciated.
It's not clear whether an ID can have an arbitrary number of categories. If so, this is not possible in normal SQL alone. The SQL language has a strict requirement for the number and types of columns to be known at query compile time, before looking at any data.
That doesn't mean what you want to do can't happen at all... just that the solution will be more involved. For example, you may need to do the pivot in your reporting tool or client code. The other alternative is dynamic sql over three steps: First, run a query to determine how many categories you will need. Second, use the information from step one to build a new SQL statement on the fly, probably involving an additional join back to the same table for each category, using the PIVOT keyword, or both. Finally, execute the SQL statement built in step two.

Is it possible to recursively combine similar records (keeping - and adding - only specific columns) using a select?

I've been wracking my brain here trying to figure out a way to achieve a solution to the following without external applications (such as Excel).
I'll set up the structure: We are using a 3rd party ERP that provides a nicely configured conversion system for product packaging types. I'm trying to create a query that will take all conversions for a given product and return them inline. Because the number of conversion records is indeterminate, the query would need to be recursive.
To make things simple, let's use package quantites for this scenario example. If a product can be shipped in [eaches, pairs, sets, packages, and cartons], the conversion table records would look something like this:
pkConvKey
fkProdID
childUnit
parentUnit
chPerParent
ConvRec001
Prod123
each
pair
2
ConvRec002
Prod123
pair
set
3
ConvRec003
Prod123
set
pack
7
ConvRec004
Prod123
pack
carton
24
Using the table above, I can determine how many pairs of Prod123 are contained in a carton by following the math:
24 packs x 7 sets x 3 pairs = 504 pairs per carton.
I could further multiply that by 2 to get the count of individual pieces in a carton (1,008). That's the idea behind the conversion table but here's my actual problem.
I'd like to return a table of records where associated conversions are in-line, thusly:
fkProdID
unit1
unit2
qtyInUnit2
unit3
qtyInUnit3
unit4
qtyInUnit4
unit5
qtyInUnit5
Prod123
each
pair
2
set
3
pack
7
carton
24
Complicating the matter is that the unit types are unknown (arbitrary) values and there is no requirement to have a full, intact chain from unit A to unit Z. (For example, there might be a conversion record from each to pair, and another from set to pack, but not one from pair to set).
In this scenario, the select can't recursively link the records, and they would appear in the resulting table as two separate records - which is fine.
I have attempted to join the table to itself on t1.parentUnit = t2.childUnit, but that obviously doesn't work recursively.
I fear my only solution is to left join the table over and over - as many as 20 times in the query, settling for NULL values if additional conversions do not exist but then I would also have many duplicate rows (with incomplete conversion chains) to weed out.
Can this be done in a select query?
Thanks in advance!
-Dan

MSAccess Slow Updates on Self-Joined table

I am trying to improve the performance of updating only about 60K rows with data coming from different rows in the same table. At about 2 minutes, it's not terrible, but it's not great either, and my application really doesn't work if you have to wait so long between recalculations.
The app generates a set of financial statements for a business, where it calculates basic formulas on 1300 line items, like Rent, or Direct Labor, or Inventory costs, all of which roll up to totals that mimic the Balance Sheet, P&Ls, Cash Flow etc. Many of the line items need to calculate on a month by month basis, where for instance it has figure out April's On Hand Inventory before knowing what April's Inventory Value is. So the total program ends up looping through 48 months over 30 calculation passes, requiring about 8000 SQL statements. (fortunately it figures it all out by itself!) Each SQL is taking only a few milliseconds, but it adds up.
I'm pretty sure I can't reduce the number of loops, so I keep trying to figure out how to make each SQL quicker. The basic structure is as follows:
LI: Line item table that holds the basic info of each item, primary key LID
LID Name
123 Sales_1
124 Sales_2
200 Total Sales
Formula: Master/Detail tables that create any formula from the line items
Total sales=Sales_1 + Sales_2
or
{200}={123}+{124}
(I use curly braces to be able to find and replace the LIDs within the formula, as shown in the SQL below)
FC: Formula Calculation table: all line items by month, about 1300 items x 48 months=62K records. Primary key FID
FID SQL_ID LID LID_brace LIN OutputMonth Formula Amount
3232 25 123 {123} Sales_1 1 1200
3255 26 124 {124} Sales_2 1 1500
5454 177 200 {200} Total Sales 1 {123}+{124}
DMO:Operand Join table, which links a formula to its detail lines within the same table, so once Sales_1 is calculated, it can find the Total Sales record and update it, which then will evaluate then send its amount up the chain to the other LIDs that depend on it, such as Total Income. It locates the record to update based on the SQL_id, which is set based on the calc pass and month. Its complex to setup, but pretty straightforward once you actually run things
Master_FID Detail_FID
5454 3232 (links total sales to sales_1)
5454 3255 (links total sales to sales_2)
SQL1:
Update FC inner join DMO on FC.FID=DMO.Master_FID inner join FC2 on DMO.Detail_FID=FC2.FID set FC.formula=replace(FC.Formula,FC.LID_brace,FC2.Amount) where FC.sql_id=177
The above will change {123} + {124} to 1200+1500 which will then evaluate to 2700 when I run the following
SQL2:
UPDATE FC SET FC.amount = Eval([fc].[formula]) WHERE (((FC.calc_sql_id)=177 )
So those two sql statements are run over and over again, with the only thing changing is the SQL_id.
There are indexes on the SQL_ID, LID, FID etc
When measuring, the milliseconds per record can range from .04ms if there are many records included (~10K for some passes), up to 10 or 15 ms for just one record updated. Perhaps it is the setup of the query causing a whole lot of overhead time, because it doesn't seem to be a function of the actual number of records updated? Also its not very consistent, where some runs have 20+ ms compared to less than 3ms when it runs it again.
I know this is a complex question i'm asking that probably doesn't have a simple answer, but I'm just looking for directions for what might help. For instance, a parameter query if there isn't a whole lot of change between runs? Does Access have a better time of running a query if knows about it in advance, i.e a named query with parameters vs dynamic SQL? Am I just doomed because it still needs to run those 8000 queries?
Also, is there inherently a problem with trying to update the same table through a secondary join table, and/or is there a better way to do it?
Is it also because string replacing isn't efficient this way? If I tried RegEx would that be quicker? I would have to make a function that could do that within a query, but it seems like that's going to be slower.
Thanks in advance, this has been a most vexing problem!!!

Having trouble with left join SQL in SQLite

Short background: I have an SQLite database, a couple of gb in size and growing. It contains a bunch of very simple tables. Each table consists of a 64-bit integer primary index field (TStamp) and a value field (Val). The field TStamp is actually an long-int representation of a date-time. The tables have widely varying row-counts and somewhat variable content types, but that shouldn't matter. A master table (tbIndDate) holds a full range of dates, has the same primary index (TStamp) as the other tables, and holds human-readable date-time in its Val field. For instance,
The master index table, named tbIndDate:
TStamp Val
634082688000000000 5/1/2010 0:00:00
634082691000000000 5/1/2010 0:05:00
634082694000000000 5/1/2010 0:10:00
634082697000000000 5/1/2010 0:15:00
etc etc
A sample table for automation tag 6FI1.PV, named tb6FI1%PV:
TStamp Val
634085793000000000 41.7
634085796000000000 42.83
634085799000000000 41.44
634085802000000000 40.43
634085805000000000 39.78
etc etc
Getting data into the tables is handled by a little vb.net program, and when a new automation tag is added to the capture list then the program creates a new table using the automation tag name, and begins populating it. That all works real slick.
OK. I've started building a tool for getting data out of the database. It works great for inner joins:
SELECT [tbIndDate].[Val] AS 'Timestamp',[tb6FI1%PV].[Val] AS '6FI1.PV',
[tb6FI34%PV].[Val] AS '6FI34.PV',[tb6AI32%PV].[Val] AS '6AI32.PV'
FROM [tbIndDate],[tb6FI1%PV],[tb6FI34%PV],[tb6AI32%PV]
WHERE [tbIndDate].[TStamp]=[tb6FI1%PV].[TStamp]
AND [tbIndDate].[TStamp]=[tb6FI34%PV].[TStamp]
AND [tbIndDate].[TStamp]=[tb6AI32%PV].[TStamp];
This returns:
Timestamp 6FI1.PV 6FI34.PV 6AI32.PV
1/1/2013 0:00:00 42.4679 1.499 0.8439
1/1/2013 0:05:00 40.3628 1.5048 0.8435
1/1/2013 0:10:00 38.2652 1.5028 0.8436
1/1/2013 0:15:00 37.8582 1.5029 0.8436
Yay! :)
I've also gotten some averaging and time-interval queries working.
However, because tag data is not all available for all dates, I'd like to create the option to list all dates in the master index even if some of the tag tables do not have matching data.
A SELECT query with a left outer join, in other words. Everyone knows that. The data might look like:
Timestamp 6FI1.PV 6FI34.PV 6AI32.PV
1/1/2013 0:00:00 42.4679 1.499 NULL
1/1/2013 0:05:00 40.3628 1.5048 NULL
1/1/2013 0:10:00 38.2652 NULL NULL
1/1/2013 0:15:00 37.8582 NULL 0.8436
Trouble is, none of the SQL I've tried has worked. Here's one that didn't go:
SELECT [tbIndDate].[Val] AS 'Timestamp',[tb6FI1%PV].[Val] AS '6FI1.PV',
[tb6FI34%PV].[Val] AS '6FI34.PV'
FROM [tbIndDate],[tb6FI1%PV],[tb6FI34%PV]
LEFT JOIN [tbIndDate] ON [tbIndDate].[TStamp]=[tb6FI1%PV].[TStamp]
LEFT JOIN [tbIndDate] ON [tbIndDate].[TStamp]=[tb6FI34%PV].[TStamp];
The error was "SQL error or missing database, ambiguous column name: tbIndDate.Val"
I've tried copying the syntax from several examples, but none are exactly the same and my attempts fail.
Am I doing the aliases wrong? The square brackets to accommodate special characters in table names? I'm a complete SQL beginner, so don't hold back with the advice.
It looks like the problem is that you're trying to join [tbIndDate] several times. Try this:
SELECT [tbIndDate].[Val] AS 'Timestamp',[tb6FI1%PV].[Val] AS '6FI1.PV',
[tb6FI34%PV].[Val] AS '6FI34.PV'
FROM [tbIndDate]
LEFT JOIN [tb6FI1%PV] ON [tbIndDate].[TStamp]=[tb6FI1%PV].[TStamp]
LEFT JOIN [tb6FI34%PV] ON [tbIndDate].[TStamp]=[tb6FI34%PV].[TStamp];

SQL Query with multiple values in one column

I've been beating my head on the desk trying to figure this one out. I have a table that stores job information, and reasons for a job not being completed. The reasons are numeric,01,02,03,etc. You can have two reasons for a pending job. If you select two reasons, they are stored in the same column, separated by a comma. This is an example from the JOBID table:
Job_Number User_Assigned PendingInfo
1 user1 01,02
There is another table named Pending, that stores what those values actually represent. 01=Not enough info, 02=Not enough time, 03=Waiting Review. Example:
Pending_Num PendingWord
01 Not Enough Info
02 Not Enough Time
What I'm trying to do is query the database to give me all the job numbers, users, pendinginfo, and pending reason. I can break out the first value, but can't figure out how to do the second. What my limited skills have so far:
select Job_number,user_assigned,SUBSTRING(pendinginfo,0,3),pendingword
from jobid,pending
where
SUBSTRING(pendinginfo,0,3)=pending.pending_num and
pendinginfo!='00,00' and
pendinginfo!='NULL'
What I would like to see for this example would be:
Job_Number User_Assigned PendingInfo PendingWord PendingInfo PendingWord
1 User1 01 Not Enough Info 02 Not Enough Time
Thanks in advance
You really shouldn't store multiple items in one column if your SQL is ever going to want to process them individually. The "SQL gymnastics" you have to perform in those cases are both ugly hacks and performance degraders.
The ideal solution is to split the individual items into separate columns and, for 3NF, move those columns to a separate table as rows if you really want to do it properly (but baby steps are probably okay if you're sure there will never be more than two reasons in the short-medium term).
Then your queries will be both simpler and faster.
However, if that's not an option, you can use the afore-mentioned SQL gymnastics to do something like:
where find ( ',' |fld| ',', ',02,' ) > 0
assuming your SQL dialect has a string search function (find in this case, but I think charindex for SQLServer).
This will ensure all sub-columns begin and start with a comma (comma plus field plus comma) and look for a specific desired value (with the commas on either side to ensure it's a full sub-column match).
If you can't control what the application puts in that column, I would opt for the DBA solution - DBA solutions are defined as those a DBA has to do to work around the inadequacies of their users :-).
Create two new columns in that table and make an insert/update trigger which will populate them with the two reasons that a user puts into the original column.
Then query those two new columns for specific values rather than trying to split apart the old column.
This means that the cost of splitting is only on row insert/update, not on _every single select`, amortising that cost efficiently.
Still, my answer is to re-do the schema. That will be the best way in the long term in terms of speed, readable queries and maintainability.
I hope you are just maintaining the code and it's not a brand new implementation.
Please consider to use a different approach using a support table like this:
JOBS TABLE
jobID | userID
--------------
1 | user13
2 | user32
3 | user44
--------------
PENDING TABLE
pendingID | pendingText
---------------------------
01 | Not Enough Info
02 | Not Enough Time
---------------------------
JOB_PENDING TABLE
jobID | pendingID
-----------------
1 | 01
1 | 02
2 | 01
3 | 03
3 | 01
-----------------
You can easily query this tables using JOIN or subqueries.
If you need retro-compatibility on your software you can add a view to reach this goal.
I have a tables like:
Events
---------
eventId int
eventTypeIds nvarchar(50)
...
EventTypes
--------------
eventTypeId
Description
...
Each Event can have multiple eventtypes specified.
All I do is write 2 procedures in my site code, not SQL code
One procedure converts the table field (eventTypeIds) value like "3,4,15,6" into a ViewState array, so I can use it any where in code.
This procedure does the opposite it collects any options your checked and converts it in
If changing the schema is an option (which it probably should be) shouldn't you implement a many-to-many relationship here so that you have a bridging table between the two items? That way, you would store the number and its wording in one table, jobs in another, and "failure reasons for jobs" in the bridging table...
Have a look at a similar question I answered here
;WITH Numbers AS
(
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 0)) AS N
FROM JobId
),
Split AS
(
SELECT JOB_NUMBER, USER_ASSIGNED, SUBSTRING(PENDING_INFO, Numbers.N, CHARINDEX(',', PENDING_INFO + ',', Numbers.N) - Numbers.N) AS PENDING_NUM
FROM JobId
JOIN Numbers ON Numbers.N <= DATALENGTH(PENDING_INFO) + 1
AND SUBSTRING(',' + PENDING_INFO, Numbers.N, 1) = ','
)
SELECT *
FROM Split JOIN Pending ON Split.PENDING_NUM = Pending.PENDING_NUM
The basic idea is that you have to multiply each row as many times as there are PENDING_NUMs. Then, extract the appropriate part of the string
While I agree with DBA perspective not to store multiple values in a single field it is doable, as bellow, practical for application logic and some performance issues. Let say you have 10000 user groups, each having average 1000 members. You may want to have a table user_groups with columns such as groupID and membersID. Your membersID column could be populated like this:
(',10,2001,20003,333,4520,') each number being a memberID, all separated with a comma. Add also a comma at the start and end of the data. Then your select would use like '%,someID,%'.
If you can not change your data ('01,02,03') or similar, let say you want rows containing 01 you still can use " select ... LIKE '01,%' OR '%,01' OR '%,01,%' " which will insure it match if at start, end or inside, while avoiding similar number (ie:101).