As part of my SQL query, I first create a table called 'new' based on certain conditions and then insert into the new table some columns from a different table old based on certain conditions. Here's a simplified version of what I am trying to do:
item | subitem | stage | id |
-----------------------------------------------
i1 | s1 | picked | 1 |
i1 | s2 | shipped | 1 |
i2 | s4 | picked | 2 |
i3 | s10 | shipped | 2 |
i3 | s11 | eligible | 0 |
i4 | s2 |not eligible| 0 |
i1 | s1 | picked | 3 |
i1 | s2 | picked | 3 |
I want the output as following:
item1|subitem1|item2|subitem2|pair_volume|item1pick|item1ship|item2pick|item2ship|
-------------------------------------------------------------------------------------
i1 | s1 | i1 | s2 | 2 | 2 | 0 | 2 | 1 |
i1 | s1 | i2 | s4 | 1 |1 | 0 | 1 | 1 |
....
.....
....
Essentially, what I want to do here is the following:
I want to first make a cross-join of item, subitem with itself, so I have all possible combinations of item1, subitem1 and item2, subitem2.
Here's how the stages are defined. If a stage is eligible, then it is just eligible, whereas if a stage is picked, then it means that item is both eligible and picked. If a stage is shipped, then it means that item is eligible, picked and shipped. Only the last stage is mentioned.
Now, for every item1, subitem1 - item2, subitem2 pair, I want to calculate the count of id sessions where this pair occurs and has stage in (eligible, shipped, picked) and populate the value in pair_volume. For eg, (i1,s1)-(i1,s2) pair occurs twice (once in id 1 and once in id 3) and in both these id sessions, both the items in the pair were eligible (which is implied from picked and shipped) stages. Out of the 2 times that this eligible pair occurred, how many times is item1 picked, item2 picked, item1 shipped and item2 shipped? This is what I am trying to solve.
I can do fine until cross join. But I don't know how to insert values into the cross joined table. The real problem is more complex than what I have mentioned here. Any help is much appreciated!
Here's the query I have so far:
with new as
(
select combination.item1, combination.subitem1, combination.item2,
combination.subitem2,
(
select tmp.item1, tmp.subitem1, tmp.item2, tmp.subitem2,
case when ((tmp.item1 = tmp.item2) and
(tmp.subitem1 = tmp.subitem2)) then 'TRUE' else
'FALSE' end as indicator from
(
select distinct item as item1, subitem as subitem2
from old
cross join
select distinct item as item2, subitem as subitem2
from old
) tmp
) combination
where combination.indicator = 'FALSE'
)
insert into new (pair_volume)
select count(id) as pair_volume
from old
where ((new.item1 = old.item) and (new.subitem1 = old.subitem) and
stage
in ('picked', 'eligible', 'shipped')) and
((new.item2 = old.item) and (new.subitem2 = old.subitem) and
stage
in ('picked', 'eligible', 'shipped')
This is essentially what I am trying to do here and the insert into statement keeps throwing error. The part until cross join works fine. But I am having trouble inserting values into the table as well as making the right conditions for the output table I want. Any help is much much appreciated!!
However, this throws an error for me as follows:
SQL compilation error: syntax error line 4 at position 0 unexpected 'insert'.
I tried including a semi colon before the insert into statement, this is what I get:
SQL compilation error: syntax error line 3 at position 1 unexpected ';'.
I tried including a comma , before the insert into statement, this is what I get:
SQL compilation error: syntax error line 4 at position 0 unexpected 'insert'. syntax error line 5 at position 0 unexpected 'select'.
What am I doing wrong here? Based on a lot of other posts, I figured this is how we use insert into with a CTE.
If you are not obliged to use CTE (basically using CTE does not make much sense when you want to create a new Table), you can try, like:
INSERT INTO new (tmp1, tmp2)
SELECT tmp1,tmp2
FROM old
Edit: Assuming SQL Server
As stated in the comments, you are very close but a CTE doesn't create a table, which is why it doesn't work.
This bit is fine:
with new as
(
select combination.item1, combination.subitem1, combination.item2,
combination.subitem2,
(
select tmp.item1, tmp.subitem1, tmp.item2, tmp.subitem2,
case when ((tmp.item1 = tmp.item2) and
(tmp.subitem1 = tmp.subitem2)) then 'TRUE' else
'FALSE' end as indicator from
(
select distinct item as item1, subitem as subitem2
from old
cross join
select distinct item as item2, subitem as subitem2
from old
) tmp
) combination
where combination.indicator = 'FALSE'
)
The next line:
insert into new (pair_volume)
You cannot insert into new. There is no table called "new". Even if you could insert into the CTE, there is no column called pair_volume.
You need to create a seperate table to contain the data output that you are creating.
Do you have a fixed number of subitems? If not, it becomes more complex and you will need to use dynamic SQL to build a query to get the result that you want, or if your RDBMS supports it you could use SELECT INTO to dynamically create a table from the resultset.
Related
I would like to put the following query in a variable but I have the error:
"subquery return more than one value"
select (select count(cb) from pater b where b.cb=a.parent) AS test_pater
from pater a
I would like to count the number of times parent value appears in cb column, and put the result in test_pater column for each row
expected results:
I am reading between the lines on this one, but if your goal is to count the number of times each parent occurs in the "cb" column of that table, I think something like this will give you that.
select cb as parent, count (*) as occurrences_in_cb
from pater p
where exists (
select null
from pater c
where p.cb = c.parent
)
group by cb
Depending on what you are doing with it, you might not need the semi-join ("exists clause"). That is only there to prevent cb entities that are not parents from being in the query results.
It will not, however, give you zero counts. Wasn't sure if that was important or not, as it was listed in your example. Short of understanding your use case, it's hard to tell.
I've had to guess some at your table structure, but I think this may be what you want?
http://sqlfiddle.com/#!9/fdc493/3
Creating your table structure
CREATE TABLE pater (cb int, parent int);
INSERT INTO pater VALUES
(1,0), -- root
(2,1), (3,1), -- parent 1 has 2 children
(4,3), (5,3); -- parent 3 has 2 children
Your query
SELECT cb, (SELECT count(cb) FROM pater b WHERE b.parent = a.cb) as test_pater FROM pater a;
Results
| cb | test_pater |
|----|------------|
| 1 | 2 |
| 2 | 0 |
| 3 | 2 |
| 4 | 0 |
| 5 | 0 |
I want to translate a template in an sql query. Lets assume there are the following fourtables: state, stateProperty, state_stateproperty and translation:
state_stateproperty
|---------------------|--------------------|
| state_id | stateproperties_id |
|---------------------|--------------------|
| 1 | 2 |
|---------------------|--------------------|
| 1 | 3 |
|---------------------|--------------------|
stateproperty
|---------------------|------------------|
| id | key | value |
|------|--------------|------------------|
| 2 | ${firstName} | John |
|------|--------------|------------------|
| 3 | ${lastName} | Doe |
|------|--------------|------------------|
state
|---------------------|
| id | template |
|------|--------------|
| 1 | template |
|------|--------------|
translation
|------------|--------------|---------------------------------|
| language | messageId | value |
|------------|--------------|---------------------------------|
| en | template | ${lastName}, ${firstName} alarm |
|------------|--------------|---------------------------------|
The aim is to get a new entity named translatedstate that includes the translated template of the state. In this example the translated template would look like: "Doe, John alarm". How can you join a many to many table in native sql and translate the template of the state with the values of its related state properties?
To be honest I would create a little function where I would loop through your state_property and cumulative replace the found wildcard string with its text.
But I had some fun to solve it in a query. I am not sure if it matches all special cases but for your example it works:
demo:db<>fiddle
SELECT
string_agg( -- 8
regexp_replace(split_key, '^.*\}', value), -- 7
'' ORDER BY row_number
)
FROM (
SELECT
s.id,
sp.value,
substring(key, 3) as s_key, -- 5
split_table.*
FROM translation t
JOIN statechange sc ON t.messageid = sc.completemessagetemplateid -- 1
JOIN state s ON s.id = sc.state_id
JOIN state_stateproperty ssp ON s.id = ssp.state_id
JOIN stateproperty sp ON ssp.stateproperties_id = sp.id
JOIN translation stnme ON s.nameid = stnme.messageid
CROSS JOIN
regexp_split_to_table( -- 3
-- 2
replace(t.messagetranslation, '${state}', stnme.messagetranslation),
'\$\{'
) WITH ORDINALITY as split_table(split_key, row_number) -- 4
WHERE t.language = 'en'
) s
WHERE position(s_key in split_key) = 0 and split_key != '' -- 6
GROUP BY id -- 8
Simple join the tables together (for next time you could simplify your example a little bit so that we don't have to create these different table. I am sure you know how to join)
Hardly replace the ${state} variable with the state nameid
This splits the template string every time a ${ string is found. So it creates a new row which begins a certain wildcard. Note that ${firstName} would become firstName} because the string delimiter is being deleted.
Adding a row count to get a criteria how the rows are ordered when I aggregate them later (8). WITH ORDINALITY only works as part of the FROM clause so the whole function it has been added here with a join.
Because of (3) I strip the ${ part from the keys as well. So it can be better parsed and compared later (in 6)
Because (3) creates too much rows (cross join) I want only these where the key is the first wildcard of my split string. All others are wrong.
Now I replace the wildcard with this key
Because we have only one wildcard per row we need to merge them together into one string again (grouped by state_id). The achieve the right order, we are using the row number from (5)
I am sorry for what may be a long post in advance.
Background:
I am using Rational Team Concert (RTC) which stores work item data in conjunction with Jazz Reporting Service to create reports. Using the Report Builder tool, it allows you to write your own queries to pull data as a table, and has its own interface to represent the table as a graph.
There is not much options for of graphing; the chart type defaults as a count, unless you specify it to show a sum. In order to graph by sum, the data must be a number rather than a string. By default, the Report Builder assumes all variables in the SELECT statement are strings.
The data which I will be using are a bunch of work items. Each work item is associated to a team (A, B) and has a work estimation number (count1, count2).
Item # | Team | Work |
------------------------
123 | A | count1 |
------------------------
124 | A | count2 |
------------------------
125 | B | count2 |
------------------------
....
Problem:
Since the work estimation is entered as a Tag, the first step was to use a CATCH WHEN block when using SELECT to transform count1 -> 1, and count2 -> 2 (the string tag to an actual number which can be summed). This resulted in a table with numbers 1 and 2 in place of the typed tag (good so far).
Item # | Team | Work |
------------------------
123 | A | 1 |
------------------------
124 | A | 2 |
------------------------
125 | B | 2 |
------------------------
....
The problem is that I am trying to graph by sum, which means getting the tool to identify the variables in the SELECT statement as numbers, except for some reason any variable I declare in a SELECT statement is always viewed as a string (The tool has a table of the current columns i.e. variables in the SELECT, along with that the tool identifies as its variable type).
Attempted Solutions:
The first query I did was to return a table of each work item with its team name and work estimate
SELECT T1.NAME,
(CASE WHEN T1.TAGs='count1' THEN 1 ELSE 2 END) AS WORK
FROM RIDW.VW_REQUEST T1
WHERE T1.PROJECT_ID = 73
Which resulted in
Team | Work |
----------------
A | 1 |
----------------
A | 2 |
----------------
B | 2 |
----------------
....
but the tool still sees the numbers as strings. I then tried explicitly casting the CASE to an integer, but resulted in the same issue
...
CAST(CASE WHEN T1.TAGs='count1' THEN 1 ELSE 2 END AS Integer) AS WORK
...
Which again the tool still represents as a string.
Current Goal:
As I cannot confirm if the tool has an underlying problem, compatibility issues with queries, etc. What I believe will work now would be to return a table with 2 rows: The sum of the work for each team
|Sum of 1's and 2's |
-----------------------------
Team A | SUM(1) + SUM(2) |
-----------------------------
Team B | SUM(1) + SUM(2) |
-----------------------------
What I am having trouble with is using sub queries to use SUM to sum the data. When I try
SUM(CASE WHEN ... END) AS TIME2 I get an error that "Column modifiers AVG and SUM apply only to number attributes". This has me thinking that I need to have a sub query which returns the column after the CASE, and then SUM that, but I am sailing into uncharted waters and can't seem to get the syntax to work.
I understand that a post like this would be better off on the product help forum. I have tried asking around but cannot get any help. The solution I am proposing of returning the 2 row/column table should bypass any issues the software may have, but I need help sub-querying the SUM when using a case.
I appreciate your time and help!
EDIT 1:
Below is the full query code which preforms the CASE correctly, but still causes with the interpreted type by the tool:
SELECT
T1.Name,
CAST(CASE WHEN T1.TAGS='|release_points_1|' THEN 1 ELSE (CASE WHEN T1.TAGS='|release_points_2|' THEN 2 ELSE 0 END) END AS Integer) AS TAG,
FROM RIDW.VW_REQUEST T1
WHERE T1.PROJECT_ID = 73
AND
(T1.ISSOFTDELETED = 0) AND
(T1.REQUEST_ID <> -1 AND T1.REQUEST_ID IS NOT NULL
This small adjustment to your current query should work:
SELECT
T1.Name,
SUM(CAST(CASE WHEN T1.TAGS='|release_points_1|' THEN 1 ELSE (CASE WHEN T1.TAGS='|release_points_2|' THEN 2 ELSE 0 END) END AS Integer)) AS TAG,
FROM RIDW.VW_REQUEST T1
WHERE T1.PROJECT_ID = 73
AND
(T1.ISSOFTDELETED = 0) AND
(T1.REQUEST_ID <> -1 AND T1.REQUEST_ID IS NOT NULL
GROUP BY T1.Name
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I have googled and read throug some articles like
this postgreSQL manual page
or this blog page
and tried making queries myself with a moderate success (part of them hangs, while others works good and fast),
but so far I can not completely understand how this magic works.
Can anybody give very clear explanation demonstrating such query semantics and execution process,
better based on typical samples like factorial calculation or full tree expansion from (id,parent_id,name) table?
And what are the basic guidlines and typical mistakes that one should know to make good with recursive queries?
First of all, let us try to simplify and clarify algorithm description given on the manual page. To simplify it consider only union all in with recursive clause for now (and union later):
WITH RECURSIVE pseudo-entity-name(column-names) AS (
Initial-SELECT
UNION ALL
Recursive-SELECT using pseudo-entity-name
)
Outer-SELECT using pseudo-entity-name
To clarify it let us describe query execution process in pseudo code:
working-recordset = result of Initial-SELECT
append working-recordset to empty outer-recordset
while( working-recordset is not empty ) begin
new working-recordset = result of Recursive-SELECT
taking previous working-recordset as pseudo-entity-name
append working-recordset to outer-recordset
end
overall-result = result of Outer-SELECT
taking outer-recordset as pseudo-entity-name
Or even shorter - Database engine executes initial select, taking its result rows as working set. Then it repeatedly executes recursive select on the working set, each time replacing contents of the working set with query result obtained. This process ends when empty set is returned by recursive select. And all result rows given firstly by initial select and then by recursive select are gathered and feeded to outer select, which result becomes overall query result.
This query is calculating factorial of 3:
WITH RECURSIVE factorial(F,n) AS (
SELECT 1 F, 3 n
UNION ALL
SELECT F*n F, n-1 n from factorial where n>1
)
SELECT F from factorial where n=1
Initial select SELECT 1 F, 3 n gives us initial values: 3 for argument and 1 for function value.
Recursive select SELECT F*n F, n-1 n from factorial where n>1 states that every time we need to multiply last funcion value by last argument value and decrement argument value.
Database engine executes it like this:
First of all it executes initail select, which gives the initial state of working recordset:
F | n
--+--
1 | 3
Then it transforms working recordset with recursive query and obtain its second state:
F | n
--+--
3 | 2
Then third state:
F | n
--+--
6 | 1
In the third state there is no row which follows n>1 condition in recursive select, so forth working set is loop exits.
Outer recordset now holds all the rows, returned by initial and recursive select:
F | n
--+--
1 | 3
3 | 2
6 | 1
Outer select filters out all intermediate results from outer recordset, showing only final factorial value which becomes overall query result:
F
--
6
And now let us consider table forest(id,parent_id,name):
id | parent_id | name
---+-----------+-----------------
1 | | item 1
2 | 1 | subitem 1.1
3 | 1 | subitem 1.2
4 | 1 | subitem 1.3
5 | 3 | subsubitem 1.2.1
6 | | item 2
7 | 6 | subitem 2.1
8 | | item 3
'Expanding full tree' here means sorting tree items in human-readable depth-first order while calculating their levels and (maybe) paths. Both tasks (of correct sorting and calculating level or path) are not solvable in one (or even any constant number of) SELECT without using WITH RECURSIVE clause (or Oracle CONNECT BY clause, which is not supported by PostgreSQL). But this recursive query does the job (well, almost does, see the note below):
WITH RECURSIVE fulltree(id,parent_id,level,name,path) AS (
SELECT id, parent_id, 1 as level, name, name||'' as path from forest where parent_id is null
UNION ALL
SELECT t.id, t.parent_id, ft.level+1 as level, t.name, ft.path||' / '||t.name as path
from forest t, fulltree ft where t.parent_id = ft.id
)
SELECT * from fulltree order by path
Database engine executes it like this:
Firstly, it executes initail select, which gives all highest level items (roots) from forest table:
id | parent_id | level | name | path
---+-----------+-------+------------------+----------------------------------------
1 | | 1 | item 1 | item 1
8 | | 1 | item 3 | item 3
6 | | 1 | item 2 | item 2
Then, it executes recursive select, which gives all 2nd level items from forest table:
id | parent_id | level | name | path
---+-----------+-------+------------------+----------------------------------------
2 | 1 | 2 | subitem 1.1 | item 1 / subitem 1.1
3 | 1 | 2 | subitem 1.2 | item 1 / subitem 1.2
4 | 1 | 2 | subitem 1.3 | item 1 / subitem 1.3
7 | 6 | 2 | subitem 2.1 | item 2 / subitem 2.1
Then, it executes recursive select again, retrieving 3d level items:
id | parent_id | level | name | path
---+-----------+-------+------------------+----------------------------------------
5 | 3 | 3 | subsubitem 1.2.1 | item 1 / subitem 1.2 / subsubitem 1.2.1
And now it executes recursive select again, trying to retrieve 4th level items, but there are none of them, so the loop exits.
The outer SELECT sets the correct human-readable row order, sorting on path column:
id | parent_id | level | name | path
---+-----------+-------+------------------+----------------------------------------
1 | | 1 | item 1 | item 1
2 | 1 | 2 | subitem 1.1 | item 1 / subitem 1.1
3 | 1 | 2 | subitem 1.2 | item 1 / subitem 1.2
5 | 3 | 3 | subsubitem 1.2.1 | item 1 / subitem 1.2 / subsubitem 1.2.1
4 | 1 | 2 | subitem 1.3 | item 1 / subitem 1.3
6 | | 1 | item 2 | item 2
7 | 6 | 2 | subitem 2.1 | item 2 / subitem 2.1
8 | | 1 | item 3 | item 3
NOTE: Resulting row order will remain correct only while there are no punctuation characters collation-preceeding / in the item names. If we rename Item 2 in Item 1 *, it will break row order, standing between Item 1 and its descendants.
More stable solution is using tab character (E'\t') as path separator in query (which can be substituted by more readable path separator later: in outer select, before displaing to human or etc). Tab separated paths will retain correct order until there are tabs or control characters in the item names - which easily can be checked and ruled out without loss of usability.
It is very simple to modify last query to expand any arbitrary subtree - you need only to substitute condition parent_id is null with perent_id=1 (for example). Note that this query variant will return all levels and paths relative to Item 1.
And now about typical mistakes. The most notable typical mistake specific to recursive queries is defining ill stop conditions in recursive select, which results in infinite looping.
For example, if we omit where n>1 condition in factorial sample above, execution of recursive select will never give an empty set (because we have no condition to filter out single row) and looping will continue infinitely.
That is the most probable reason why some of your queries hang (the other non-specific but still possible reason is very ineffective select, which executes in finite but very long time).
There are not much RECURSIVE-specific querying guidlines to mention, as far as I know. But I would like to suggest (rather obvious) step by step recursive query building procedure.
Separately build and debug your initial select.
Wrap it with scaffolding WITH RECURSIVE construct
and begin building and debugging your recursive select.
The recommended scuffolding construct is like this:
WITH RECURSIVE rec( <Your column names> ) AS (
<Your ready and working initial SELECT>
UNION ALL
<Recursive SELECT that you are debugging now>
)
SELECT * from rec limit 1000
This simplest outer select will output the whole outer recordset, which, as we know, contains all output rows from initial select and every execution of recusrive select in a loop in their original output order - just like in samples above! The limit 1000 part will prevent hanging, replacing it with oversized output in which you will be able to see the missed stop point.
After debugging initial and recursive select build and debug your outer select.
And now the last thing to mention - the difference in using union instead of union all in with recursive clause. It introduces row uniqueness constraint which results in two extra lines in our execution pseudocode:
working-recordset = result of Initial-SELECT
discard duplicate rows from working-recordset /*union-specific*/
append working-recordset to empty outer-recordset
while( working-recordset is not empty ) begin
new working-recordset = result of Recursive-SELECT
taking previous working-recordset as pseudo-entity-name
discard duplicate rows and rows that have duplicates in outer-recordset
from working-recordset /*union-specific*/
append working-recordset to outer-recordset
end
overall-result = result of Outer-SELECT
taking outer-recordset as pseudo-entity-name
I have a simple table with a "versioning" scheme:
Version | PartKey1 | PartKey2 | Value
1 | 0 | 0 | foo
2 | 0 | 0 | bar
1 | 1 | 0 | foobar
This table is medium (~100 000 lines for a full version). At the start it is loaded with a version 1 which contains a full snapshot, and over time incremental updates are added, but we want to preserve the old versions, thus they are added with an incremented "Version" number (2 here).
When reading the data, I want to be able to specify a maximum version, and I would like, if possible, to only retrieve the "rows" I am interested in.
For example: specifying 2 as the maximum version, I would like a query that retrieve only 2 rows in the table above:
Version | PartKey1 | PartKey2 | Value
2 | 0 | 0 | bar
1 | 1 | 0 | foobar
The row:
1 | 0 | 0 | foo
is discarded because the version 2 of this row is more recent.
I was wondering if such a selection was possible / advisable in a SQL query. I can do the filtering on the application side, but obviously it means pulling in useless resources from the DB so if it's possible (and cheap on the DB side) I'd rather offload this work to the DB.
You can do:
SELECT v1.*
FROM versioningscheme v1
LEFT JOIN versioningscheme v2
ON v2.partkey1 = v1.partkey1 AND v2.partkey2 = v1.partkey2
AND v2.version > v1.version
WHERE v2.version IS NULL
Left Join with NULL detection is very powerful and underused. Null values are returned when there is no match (and obviously, when you have the max row in v1, you can't get a row in v2 that satisfies the join condition).
select t.*
from MyTable t
inner join (
select PartKey1, PartKey2, max(Version) as MaxVersion
from MyTable
where Version <= 2
group by PartKey1, PartKey2
) tm on t.PartKey1 = tm.PartKey1
and t.PartKey2 = tm.PartKey2
and t.Version = tm.MaxVersion
This is common with time varying data (Where you choose to find the most recent value within a specific window of time), and is completely reasonable.
In your case, ROW_NUMBER() allows the data to be parsed just once, rather than multiple times. With an appropriate INDEX such as (PartKey1, PartKey2, Version), this should be exceptionally quick...
SELECT
*
FROM
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY PartKey1, PartKey2 ORDER BY Version DESC) AS reversed_version
FROM
MyTable
WHERE
Version <= <MaxVersionParamter>
)
AS data
WHERE
reversed_version = 1