union unusual behavior - sql

Trying to union two tables with the same field into one master table but for some reason im getting a weird result.
select count(*)
from staging.sandoval_parcels
where parcel_id = 0;
returns 0
select count(*)
from staging.bernalillo_parcels
where parcel_id = 0;
returns 0
but when i merge the tables using
CREATE TABLE staging.master_parcels
AS
SELECT * FROM bernalillo_parcels
UNION ALL
SELECT * FROM sandoval_parcels
;
then
select count(*)
from staging.master_parcels
where parcel_id = 0;
returns 85553
both tables have the same fields and the fields are the same data type,also, no of the values for any field are missing, thus no nulls, why am i getting ids = 0 when either of the table have parcel_ids = 0?

The order of the fields matter, replace the * for the explicit name, other wise the second query field will be inserted on the first query position. But not necessarily on the same field you want.
CREATE TABLE staging.master_parcels
AS
SELECT parcel_id, field1 ... FROM bernalillo_parcels
UNION ALL
SELECT parcel_id, field1 ... FROM sandoval_parcels
;

Union will merge tables even if the column order is not the same. If all of the columns match and are in the same order, it will union distinct values and not create duplicates if the rows are the same for each table. Having the order and data type be the same is important.

Related

Save value in local variable HANA SQL Script

I'm trying to take value from a non-empty row and overwrite it in the subsequent rows until another non-empty row appears and then write that in the subsequent rows. Coming from ABAP Background, I'm not sure how to accomplish this in HANA SQL Script. Here's a picture to show what the data looks like.
Basically 'Doe, John' should be overwritten into all the empty rows until 'Doe, Jane' appears and then 'Doe, Jane' should be overwritten into empty rows until another name appears.
My idea is to store the non-empty row in a local variable, but I haven't had much success so far. Here's my code:
tempTab1 = SELECT
CASE WHEN EMPLOYEE <> ''
THEN lv_emp = EMPLOYEE
ELSE EMPLOYEE
END AS EMPLOYEE,
FROM :tempTab;
In general, rows in dataset are unordered until you explicitly specify ORDER BY part of SQL. If you observe some order it may be a side-effect and can vary. So first of all you have to explicitly create a row number column (assume it's name is RECORD).
Then you should go this way:
Select only rows with non-empty data in column.
Use LEAD(RECORD) over(order by RECORD) to identify the next non-empty record number.
Join your source dataset to dataset defined on step 3 on between condition for RECORD field.
with a as (
select 1 as record, 'Val1' as field1 from dummy union
select 2 as record, '' as field1 from dummy union
select 3 as record, '' as field1 from dummy union
select 4 as record, 'Val2' as field1 from dummy union
select 5 as record, '' as field1 from dummy union
select 6 as record, '' from dummy union
select 7 as record, '' from dummy union
select 8 as record, 'Val3' as field1 from dummy
)
, fill_base as (
select field1, record, lead(record, 1, record) over(order by record asc) as next_record
from a
where field1 <> '' and field1 is not null
)
select
a.record
, case
when a.field1 = '' or a.field1 is null
then f.field1
else a.field1
end as field1
, a.field1 as field1_original
from a
left join fill_base as f
on a.record > f.record
and a.record < f.next_record
The performance in HANA may be bad in some cases since it process window functions very bad.
Here is another more elegant solution with two nested window functions than does not force you to write multiple selects for each column: How to make LAG() ignore NULLS in SQL Server?
You can use window aggregate function LAST_VALUE to achieve the imputation of missing values.
Sample Data
CREATE TABLE sample (id integer, sort integer, value varchar(10));
INSERT INTO sample VALUES (4711, 1, 'Hello');
INSERT INTO sample VALUES (4712, 2, null);
INSERT INTO sample VALUES (4713, 3, null);
INSERT INTO sample VALUES (4714, 4, 'World');
INSERT INTO sample VALUES (4715, 5, null);
INSERT INTO sample VALUES (4716, 6, '!');
Generate a new column with imputed values
SELECT base.*, LAST_VALUE(fill.value ORDER BY fill.sort) AS value_imputed
FROM sample base
LEFT JOIN sample fill ON fill.sort <= base.sort AND fill.value IS NOT NULL
GROUP BY base.id, base.sort, base.value
ORDER BY base.id, base.sort
Result
Note that sort could be anything determining the order (e.g. a timestamp).

Hive Query with a large WHERE Condition

I am writing a HIVE query to pull about 2,000 unique keys from a table.
I keep getting this error - java.lang.StackOverflowError
My query is basic but looks like this:
SELECT * FROM table WHERE (Id = 1 or Id = 2 or Id = 3 Id = 4)
my WHERE clause goes all the way up to 2000 unique id's and I receive the error above. Does anyone know of a more efficient way to do this or get this query to work?
Thanks!
You may use the SPLIT and EXPLODE to convert the comma separated string to rows and then use IN or EXISTS.
using IN
SELECT * FROM yourtable t WHERE
t.ID IN
(
SELECT
explode(split('1,2,3,4,5,6,1998,1999,2000',',')) as id
) ;
Using EXISTS
SELECT * FROM yourtable t WHERE
EXISTS
(
SELECT 1 FROM (
SELECT
explode(split('1,2,3,4,5,6,1998,1999,2000',',')) as id
) s
WHERE s.id = t.id
);
Make use of the Between clause instead of specifying all unique ids:
SELECT ID FROM table WHERE ID BETWEEN 1 AND 2000 GROUP BY ID;
i you can create a table for these IDs and after use the condition of exist in the new table to get only your specific IDs

How can I select multiple copies of the same row?

I have a table in MS Access with rows which have a column called "repeat"
I want to SELECT all the rows, duplicated by their "repeat" column value.
For example, if repeat is 4, then I should return 4 rows of the same values. If repeat is 1, then I should return only one row.
This is very similar to this answer:
https://stackoverflow.com/a/6608143
Except I need a solution for MS Access.
First create a "Numbers" table and fill it with numbers from 1 to 1000 (or up to whatever value the "Repeat" column can have):
CREATE TABLE Numbers
( i INT NOT NULL PRIMARY KEY
) ;
INSERT INTO Numbers
(i)
VALUES
(1), (2), ..., (1000) ;
then you can use this:
SELECT t.*
FROM TableX AS t
JOIN
Numbers AS n
ON n.i <= t.repeat ;
If repeat has only small values you can try:
select id, col1 from table where repeat > 0
union all
select id, col1 from table where repeat > 1
union all
select id, col1 from table where repeat > 2
union all
select id, col1 from table where repeat > 3
union all ....
What you can do is to retrieve the one 'unique' row and copy this row/column into a string however many copies you need from it using a for loop.

Firebird SQL challenge - return one row that has the data when select returned two rows

I have a quite unique need to make select always return one row
My SQL:
select * from table1 Where (table1.pk = :p1) or (table1.fk1 = :p1)
The above SQL always has two cases for return:
1- my Select return two records:
The only different is one of the records has data while the other has only the ID filled with data while the rest of its fields are null. I need in this case to return only the one that has data in other fields.
2- my Select return one record
In this case the record returned has only the ID field filled with data while the rest of the fields are null however this is what I want and no need for any further processing.
Please advise if is it possible to do that in one plain Select SQL. I can not use stored procedure.
You can use the first clause of the select statement to get only 1 row.
Given your specific conditions, you can order the result set descending by the rest of the fields to be sure the null row is selected only in case there's no data row (null goes first in firebird 2.5, but AFAIK this changed somewhere in the last versions, so check your specific version before applying this).
Your final query will look like this:
select first 1 *
from table1
where (table1.pk = :p1)
or (table1.fk1 = :p1)
order by somecolumn;
somecolumn being the most relevant of the other fields that can contain null values.
you can test this with this statements:
--two rows, one with ID and values and the other with ID and null
with q1 as (
select 1 id, 'data' othercolumn
from rdb$database
union
select 2 id, null othercolumn
from rdb$database
)
select first 1 *
from q1
order by othercolumn nulls last;
--versus:
--onw row with ID and null
with q1 as (
select 2 id, null othercolumn
from rdb$database
)
select first 1 *
from q1
order by othercolumn nulls last;

I am trying to return a certain values in each row which depend on whether different values in that row are already in a different table

I'm still a n00b at SQL and am running into a snag. What I have is an initial selection of certain IDs into a temp table based upon certain conditions:
SELECT DISTINCT ID
INTO #TEMPTABLE
FROM ICC
WHERE ICC_Code = 1 AND ICC_State = 'CA'
Later in the query I SELECT a different and much longer listing of IDs along with other data from other tables. That SELECT is about 20 columns wide and is my result set. What I would like to be able to do is add an extra column to that result set with each value of that column either TRUE or FALSE. If the ID in the row is in #TEMPTABLE the value of the additional column should read TRUE. If not, FALSE. This way the added column will ready TRUE or FALSE on each row, depending on if the ID in each row is in #TEMPTABLE.
The second SELECT would be something like:
SELECT ID,
ColumnA,
ColumnB,
...
NEWCOLUMN
FROM ...
NEWCOLUMN's value for each row would depend on whether the ID in that row returned is in #TEMPTABLE.
Does anyone have any advice here?
Thank you,
Matt
If you left join to the #TEMPTABLE you'll get a NULL where the ID's don't exist
SELECT ID,
ColumnA,
ColumnB,
...
T.ID IS NOT NULL AS NEWCOLUMN -- Gives 1 or 0 or True/false as a bit
FROM ... X
LEFT JOIN #TEMPTABLE T
ON T.ID = X.ID -- DEFINE how the two rows can be related unquiley
You need to LEFT JOIN your results query to #TEMPTABLE ON ID, this will give you the ID if there is one and NULL if there isn't, if you want 1 or 0 this would do it (For SQL Server) ISNULL(#TEMPTABLE.ID,0)<>0.
A few notes on coding for performance:
By definition an ID column is unique so the DISTINCT is redundant and causes unnecisary processing (unless it is an ID from another table)
Why would you store this to a temporary table rather than just using it in the query directly?
You could use a union and a subquery.
Select . . . . , 'TRUE'
From . . .
Where ID in
(Select id FROM #temptable)
UNION
SELECT . . . , 'FALSE'
FROM . . .
WHERE ID NOT in
(Select id FROM #temptable)
So the top part, SELECT ... FROM ... WHERE ID IN (Subquery), does a SELECT if the ID is in your temptable.
The bottom part does a SELECT if the ID is not in the temptable.
The UNION operator joins the two results nicely, since both SELECT statements will return the same number of columns.
To expand on what someone else was saying with Union, just do something like so
SELECT id, TRUE AS myColumn FROM `table1`
UNION
SELECT id, FALSE AS myColumn FROM `table2`