Two queries work separately. Errors when combined. How do I combine two SELECT statements in Libreoffice Base? - hsqldb

I've reached a point with a spreadsheet where it is no longer viable to keep data in that format. I've created a table in Libreoffice Base with the relevant information and I'm trying to put together some queries. Unfortunately, my attempts to create a SQL query are so far being met with syntax errors. To be expected, given it's all new to me.
Here's my example table:
TINYINT-A
TINYINT-B
NUMERIC-A
NUMERIC-B
BOOLEAN-A
BOOLEAN-B
1
2
100
200
1
0
9
8
900
800
0
1
I have the following query running fine:
SELECT
SUM("TINYINT-A") AS "First Column",
SUM("TINYINT-B") AS "Second Column",
SUM("NUMERIC-A") AS "Third Column",
SUM("NUMERIC-B") AS "Fourth Column"
FROM
"Table-A"
Output would be:
First Column
Second Column
Third Column
Fourth Column
10
10
1000
1000
I would like to add a fifth column which sums up the rows in one of the previous four column when the boolean value is equal to 1 or 0. As a separate query, I can do this:
SELECT
SUM("NUMERIC-A") AS "BOOLEAN-A-NUMERIC-A",
SUM("NUMERIC-B") AS "BOOLEAN-A-NUMERIC-B"
FROM
"Table-A"
WHERE
"BOOLEAN-A" = 1
Expected output:
BOOLEAN-A-NUMERIC-A
BOOLEAN-A-NUMERIC-B
100
200
However, if I try to put the two into one query so that the output above is tacked on to the end of the first output, I get a syntax error. This is my attempt at combining the two:
SELECT
(
SELECT
SUM("TINYINT-A") AS "First Column",
SUM("TINYINT-B") AS "Second Column",
SUM("NUMERIC-A") AS "Third Column",
SUM("NUMERIC-B") AS "Fourth Column"
FROM
"Table-A"
),
(
SELECT
SUM("NUMERIC-A") AS "BOOLEAN-A-NUMERIC-A",
SUM("NUMERIC-B") AS "BOOLEAN-A-NUMERIC-B"
FROM
"Table-A"
WHERE
"BOOLEAN-A" = 1
)
FROM
"Table-A"
I forgot which SO question I tried to derive the structure of the above from, but it clearly didn't work, so either I didn't understand it correctly, or I have left out a character somewhere.
I also attempted to take the two separate queries exactly as they are, and put a new line between them with just UNION. This results in an error stating that the given command is not a SELECT statement. I'm guessing because the two statements don't have the same output structure.
I'm not even sure if the commands are the same in Base, and whether things vary significantly enough between other databases such as MySQL. I'm sure they are, and that I'm probably just doing something comparable to attempting to execute Python using HTML tags/syntax or something.

I don't know libreoffice and use Postgres, but maybe it works the same way and you can get an idea of it.
Given:
CREATE TABLE Table_A (
TINYINT_A SMALLINT,
TINYINT_B SMALLINT,
NUMERIC_A NUMERIC,
NUMERIC_B NUMERIC,
BOOLEAN_A BOOLEAN,
BOOLEAN_B BOOLEAN
);
INSERT INTO Table_A (
TINYINT_A,
TINYINT_B,
NUMERIC_A,
NUMERIC_B,
BOOLEAN_A,
BOOLEAN_B
)
VALUES
(1,2,100,200,true,false),
(9,8,900,800,false,true);
in postgres it works with subqueries like this, although I'm sure, there are better solutions:
SELECT
SUM(TINYINT_A) AS "First Column",
SUM(TINYINT_B) AS "Second Column",
SUM(NUMERIC_A) AS "Third Column",
SUM(NUMERIC_B) AS "Fourth Column",
(SELECT SUM(NUMERIC_A) FROM Table_A WHERE BOOLEAN_A is true) AS BOOLEAN_A_NUMERIC_A,
(SELECT SUM(NUMERIC_B) FROM Table_A WHERE BOOLEAN_A is true) AS BOOLEAN_A_NUMERIC_B
FROM Table_A

Related

SQL - Similar Update Queries Produce Varying Results

I am super new to SQL and have two queries I think should produce the same output but they don't. Can someone figure out the difference between them?
The input table for this simple example has two columns, letter and extra. The data in the first column is a random letter from the list ['a', 'b', 'c', 'd', 'e'] and extra should not matter (I think?). These are the queries:
update
tbl
set
extra = letter;
and:
update
tbl
set
extra = (select
letter
from tbl);
The resulting tables these produce are:
e|e
e|e
c|c
e|e
b|b
...
and:
e|e
e|e
c|e
e|e
b|e
...
respectively.
I expect the first output for both queries, how come the second one turns out as it does?
EDIT:
The reason I ask this question is because what I want to do is a bit more involved than this simple example and I believe I need the subquery. I am trying to add a kind of normalisation column, like this:
update
tbl
set
extra = 1 / (select
norm
from
tbl
INNER JOIN
(SELECT
letter, count(*) as norm
FROM
tbl
GROUP BY letter) as tmp
ON
tbl.letter = tmp.letter);
Alas, this obviously doesn't work because of the above.
What your first query is saying:
Set the value of extra to the value of letter in the same row.
What the second query is saying:
Pick a value from the column "letter" in the table, and update every row in the table to have the column 'extra' contain that value.
They are different instructions, so you get different results.

SQL Query - Return rows where based on dual conditioning of same column

I am struggling to figure this out. I have a column, "Column1" that is VARCHAR(max) and contains a lot of raw text. I want to filter out rows based on this column not having a particular word in the column UNLESS a secondary condition is met.
For example:
Return all rows where the word "Final" does not exist. However, if the word "Final" does not exist but the word "Ongoing" does exist, then I need to return this row and not hide it.
Here is my code (not working)
SELECT *
FROM ThisTable
WHERE (
Column1 NOT LIKE '%Final%'
OR Column1 LIKE '%Ongoing%'
)
Here are some examples of data that exists, and whether the row should be returned.
Row 1: "this project has ended" (return this row)
Row 2: "this project has ended but is ongoing" (return this row)
Row 3: "this project is final." (do not return this row)
Row 4: "This project is final and ongoing" (return this row)
Your current query is filtering out records where neither 'final' nor ongoing are contained, but as I understand it you want to return everything apart from records that include 'final' but don't also contain 'ongoing'.
I think something like this should work:
WHERE NOT (Column1 LIKE '%Final%' AND Column1 NOT LIKE '%Ongoing%')

Sql column value as formula in select

Can I select a column based on another column's value being listed as a formula? So I have a table, something like:
column_name formula val
one NULL 1
two NULL 2
three one + two NULL
And I want to do
SELECT
column_name,
CASE WHEN formula IS NULL
val
ELSE
(Here's where I'm confused - How do I evaluate the formula?)
END as result
FROM
table
And end up with a result set like
column_name result
one 1
two 2
three 3
You keep saying column, and column name, but you're actually talking about rows, not columns.
The problem is that you (potentially) want different formulas for each row. For example, row 4 might be (two - one) = 1 or even (three + one) = 4, where you'd have to calculate row three before you could do row 4. This means that a simple select query that parses the formulas is going to be very hard to do, and it would have to be able to handle each type of formula, and even then if the formulas reference other formulas that only makes it harder.
If you have to be able to handle functions like (two + one) * five = 15 and two + one * five = 7, then you'd be basically re-implementing a full blown eval function. You might be better to return the SQL table to another language that has eval functions built in, or you could use something like SQL Eval.net if it has to be in SQL.
Either way, though, you've still got to change "two + one" to "2 + 1" before you can do the eval with it. Because these values are in other rows, you can't see those values in the row you're looking at. To get the value for "one" you have to do something like
Select val from table where column_name = 'one'
And even then if the val is null, that means it hasn't been calculated yet, and you have to come back and try again later.
If I had to do something like this, I would create a temporary table, and load the basic table into it. Then, I'd iterate over the rows with null values, trying to replace column names with the literal values. I'd run the eval over any formulas that had no symbols anymore, setting the val for those rows. If there were still rows with no val (ie they were waiting for another row to be done first), I'd go back and iterate again. At the end, you should have a val for every row, at which point it is a simple query to get your results.
Possible solution would be like this kind....but since you mentioned very few things so this works on your above condition, not sure for anything else.
GO
SELECT
t1.column_name,
CASE WHEN t1.formula IS NULL
t1.val
ELSE
(select sum(t2.val) from table as t2 where t2.formula is not null)
END as result
FROM
table as t1
GO
If this is not working feel free to discuss it further.

Identify the last occurance of a subsring in a string, where the substring is from a table. Teradata

I have the following problem:
I need to identify the last occurrence of any sub-string given in table A, and return that given value in return in the select statement of another statement. This is a bit convoluted, but here is the code:
SELECT TRIM(COUNTRY_CODE)
FROM (
SELECT TOP 1 POSITION( PHRASE IN MY_STRING) AS PHRASE_LOCATION, CODE
FROM REFERENCE_TABLE -- Where the country list is located
WHERE PHRASE_LOCATION > 0 -- To return NULL if there is no matches
ORDER BY 1 DESC -- To get the last one
) t1
This works when run by it self, but i have large problems getting it to work as part of another queries' select. I need "MY_STRING" to come from a higher level in the nested select three. The reasons for this is how the system is designed on a higher level.
In other words i need the following:
PHRASE is coming from a table that have a phrases and a code associated
MY_STRING is used in the higher level select and i need to associate a code with it, based on the last occurring phrase
Number of different phrases > 400 so no hard coding :(
Number of different "MY_STRING" > 1 000 000 / day
So far i tried what you can see above, but due to the constraints of the system, i cannot be to creative.
Example Phrases: "New York", "London", "Oslo"
Example Codes: "US", "UK, "NO"
Example Strings: "London House, Something street, New York"; "Some street x, 0120, OSL0".
Desired Outcomes: "US"; "NO"
This will result in a product join, i.e. use a lot of CPU:
SELECT MY_STRING
-- using INSTR searching the last occurance instead of POSITION if the same PHRASE might occur multiple times
-- INSTR is case sensitive -> must use LOWER
,Instr(Lower(MY_STRING), Lower(PHRASE), -1, 1) AS PHRASE_LOCATION
,CODE
,PHRASE
FROM table_with_MY_STRING
LEFT JOIN REFERENCE_TABLE -- to return NULL if no match
ON PHRASE_LOCATION > 0
QUALIFY
Row_Number() -- return last match
Over (PARTITION BY MY_STRING
ORDER BY PHRASE_LOCATION DESC) = 1
If this is not efficient enough another possible solution might utilize STRTOK_SPLIT_TO_TABLE/REGEXP_SPLIT_TO_TABLE: split the address into parts and then join those parts to PHRASE.

What is "Select -1", and how is it different from "Select 1"?

I have the following query that is part of a common table expression. I don't understand the function of the "Select -1" statement. It is obviously different than the "Select 1" that is used in "EXISTS" statements. Any ideas?
select days_old,
count(express_cd),
count(*),
case
when round(count(express_cd)*100.0/count(*),2) < 1 then '0'
else ''
end ||
cast(decimal(round(count(express_cd)*100.0/count(*),2),5,2) as varchar(7)) ||
'%'
from foo.bar
group by days_old
union all
select -1, -- Selecting the -1 here
count(express_cd),
count(*),
case
when round(count(express_cd)*100.0/count(*),2) < 1 then '0'
else ''
end ||
cast(decimal(round(count(express_cd)*100.0/count(*),2),5,2) as varchar(7)) ||
'%'
from foo.bar
where days_old between 1 and 7
It's just selecting the number "minus one" for each row returned, just like "select 1" will select the number "one" for each row returned.
There is nothing special about the "select 1" syntax uses in EXISTS statements by the way; it's just selecting some random value because EXISTS requires a record to be returned and a record needs data; the number 1 is sufficient.
Why you would do this, I have no idea.
When you have a union statement, each part of the union must contain the same columns. From what I read when I look at this, the first statement is giving you one line for each days old value and then some stats for each day old. The second part of the union is giving you a summary of all the records that are only a week or so less. Since days old column is not relevant here, they put in a fake value as a placeholder in order to do the union. OF course this is just a guess based on reading thousands of queries through the years. To be sure, I would need to actually run teh code.
Since you say this is a CTE, to really understand why this is is happening, you may need to look at the data it generates and how that data is used in the next query that uses the CTE. That might answer your question.
What you have asked is basically about a business rule unique to your company. The true answer should lie in any requirements documents for the original creation of the code. You should go look for them and read them. We can make guesses based on our own experience but only people in your company can answer the why question here.
If you can't find the documentation, then you need to talk (Yes directly talk, preferably in person) to the Stakeholders who use the data and find out what their needs were. Only do this after running the code and analyzing the results to better understand the meaning of the data returned.
Based on your query, all the records with days_old between 1 and 7 will be output as '-1', that is what select -1 does, nothing special here and there is no difference between select -1 and select 1 in exists, both will output the records as either 1 or -1, they are doing the same thing to check whether if there has any data.
Back to your query, I noticed that you have a union all and compare each four columns you select connected by union all, I am guessing your task is to get a final result with days_old not between 1 and 7 and combine the result with day_old, which is one because you take all between 1 and 7.
It is just a grouping logic there.
Your query returns aggregated
data (counts and rounds) grouped by days_old column plus one more group for data where days_old between 1 and 7.
So, -1 is just another additional group there, it cannot be 1 because days_old=1 is an another valid group.
result will be like this:
row1: days_old=1 count(*)=2 ...
row2: days_old=3 count(*)=5 ...
row3: days_old=9 count(*)=6 ...
row4: days_old=-1 count(*)=7