Eliminate NULL values - sql

I have the following data:
A B C D E F
NULL 1122111 NULL 0 NULL XBK
9226978 NULL 0 NULL XGI NULL
NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL
Now I need to collapse that to a single row with the below results:
A B C D E F
9226978 1122111 0 0 XGI XBK
I have no idea where to get started. Please help.

SELECT MAX(A) AS A,MAX(B) AS B
FROM Table_Name

Try this:-
SELECT COALESCE(A,0), COALESCE(B,0), COALESCE(C,0), COALESCE(D,0), COALESCE(E,0), COALESCE(F,0)
FROM YOUR_TABLE;

Related

How to change all non-null values in a column to the column name in SQL?

So let's say we have the following:
ID data1 data2 data3
001 carl NULL NULL
002 NULL rick NULL
003 NULL mitch NULL
004 NULL NULL NULL
All I want to do is make every non-null value in the column name. Something like this in Snowflake.
ID data1 data2 data3
001 data1 NULL NULL
002 NULL data2 NULL
003 NULL data2 NULL
004 NULL NULL NULL
I have distinct rows as ID as well as a few columns I dont want this applied to. Any ideas how do tackle this in SQL?
select id,
case when data1 is not null then 'data1' else null end as data1,
...

Selecting all rows conditionally between 2 arbitrary column values in SQL Server

I've joined a number of tables to get to this table. From this table I need to select all of the b_id values that fall between the start end end values that are not null. There could be multiple start and end values in the table. How can I write a SQL Server query to select all of the b_ids between but not including those rows. So for this example table I would need the b_ids 99396 AND 71828
I tried to find a similar question and found something like this but I don't believe I'm using the correct values where they need to be. Is there another way to do it. I have a solution using a cursor, but I'm trying to find a non cursor solution. My friend told me the responses on here can be brutal if you don't word the question a certain way. Please be easy on me lol.
a_id | b_id | sequence | start | end |
---------+-------+----------+-------+-------+
3675151 | 68882 | 1 | null | null |
3675151 | 79480 | 2 | 79480 | null |
3675151 | 99396 | 3 | null | null |
3675151 | 71828 | 4 | null | null |
3675151 | 28911 | 5 | null | 28911 |
3675151 | 27960 | 6 | null | null |
3675183 | 11223 | 1 | null | null |
3675183 | 77810 | 2 | null | null |
3675183 | 11134 | 3 | null | null |
3675183 | 90909 | 4 | null | null |
Is this what you are looking for
select a_id, b_id, sequence
from
table
where
(a_id,sequence )
in
(select a_id, sequence from table t1
where
sequence >
(select sequence from table t2 where t1.a_id = t2.a_id and start is not null)
and
sequence <
(select sequence from table t3 where t1.a_id = t3.a_id and end is not null)
);
Would it be this?
Mark as answer if yes, if not exemplify otherwise.
create table #table (
a_id int
,b_id int
,c_sequence int
,c_start int
,c_end int
)
insert into #table
values
(3675151 ,68882 , 1 , null , null )
,(3675151 ,79480 , 2 , 79480 , null )
,(3675151 ,99396 , 3 , null , null )
,(3675151 ,71828 , 4 , null , null )
,(3675151 ,28911 , 5 , null , 28911)
,(3675151 ,27960 , 6 , null , null )
,(3675183 ,11223 , 1 , null , null )
,(3675183 ,77810 , 2 , 4343 , null )
,(3675183 ,11134 , 3 , null , null )
,(3675183 ,90939 , 4 , null , 1231 )
select
t.*
from #table t
where
exists (select t1.b_id,t1.c_sequence
from #table t1
where t1.c_start is not null
and t.a_id =t1.a_id and t.c_sequence>t1.c_sequence )
and exists (select t1.b_id,t1.c_sequence
from #table t1
where t1.c_end is not null
and t.a_id =t1.a_id
and t.c_sequence<t1.c_sequence
You can use window functions for this:
select t.*
from (select t.*,
max(case when c_start is not null then c_sequence end) over (partition by a_id order by c_sequence) as last_c_start,
max(case when c_end is not null then c_sequence end) over (partition by a_id order by c_sequence) as last_c_end,
min(case when c_end is not null then c_sequence end) over (partition by a_id order by c_sequence desc) as next_c_end
from t
) t
where c_sequence > last_c_start and
c_sequence < next_c_end and
(last_c_start > last_c_end or last_c_end is null);
Here is a db<>fiddle.
The subquery is returning the previous start and next end. That is pretty simply. The where uses this information. The last condition just checks that the most recent "start" is the one that should be considered.
Note: This does not handle more complicated scenarios like start-->start-->end-->end. If that is a possibility, you should ask another question.
EDIT:
Actually, there is an even easier way:
select t.*
from (select t.*,
count(coalesce(c_start, c_end)) over (partition by a_id order by c_sequence) as counter
from t
) t
where c_start is null and c_end is null and
counter % 2 = 1;
This returns rows where there two values are NULL (to avoid the endpoints) and there are an odd number of non-NULL c_start/c_end values up to that row.

how to concatenate multiple columns in oracle sql into single column only for matching condition

i have below 4 columns
empid | name | dept | ph_no
---------------------------------
123 | null | null | null
124 | mike | science | null
125 | null | physics | 789
126 | null | null | 463
127 | john | null | null
and i need to merge all 4 columns into single columns only for null values.
And i need something like below--
empid
------------
123 is missing name,dept,ph_no
124 is missing ph_no
125 is missing name
126 is missing name,dept
127 is missing dept,ph_no
This can be done with case expressions.
select empid,empid||' is missing '||
trim(',' from
(case when name is null then 'name,' else '' end||
case when dept is null then 'dept,' else '' end||
case when ph_no is null then 'ph_no' else '' end
)
)
from tbl
I agree with Vamsi and would like just to add a where clause so the "complete" ones won't be returned.
select empid,empid||' is missing '||
case when name is null then 'name,' else '' end||
case when dept is null then 'dept,' else '' end||
case when ph_no is null then 'ph_no' else '' end
from tbl
where (name is null or dept is null or ph_no is null);
You can also use the NVL2 function.
SELECT empid||' is missing '||NVL2(name, NULL, 'name, ') ||NVL2(dept, NULL, 'dept, ')||NVL2(ph_no, NULL, 'ph_no') empid
FROM table_

Loading XML File data into Hive tables

I want to load the XML file into hive columns, but I am getting NULL when triggering select query on hive table.
Can anyone help?
Create table statement
CREATE TABLE `clobtest_h`(
`id` double,
`subject` string,
`body` string,
`purge_id` double,
`purge_date` timestamp,
`s_retention_applied` string,
`d_primary_column` double)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
'hdfs://nameservice1/user/hive/warehouse/support.db/clobtest_h'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='false',
'last_modified_by'='root',
'last_modified_time'='1488897940',
'numFiles'='0',
'numRows'='-1',
'rawDataSize'='-1',
'totalSize'='0',
'transient_lastDdlTime'='1488897940')
Insert query
insert into clobtest_h values(2,'Testing issue','<?xml version="1.0"?>
<?xml-stylesheet href="catalog.xsl" type="text/xsl"?>
<!DOCTYPE catalog SYSTEM "catalog.dtd">
<catalog>
<product description="Cardigan Sweater" product_image="cardigan.jpg">
<catalog_item gender="Mens">
<item_number>QWZ5671</item_number>
<price>39.95</price>
<size description="Medium">
<color_swatch image="red_cardigan.jpg">Red</color_swatch>
<color_swatch image="burgundy_cardigan.jpg">Burgundy</color_swatch>
</size>
<size description="Large">
<color_swatch image="red_cardigan.jpg">Red</color_swatch>
<color_swatch image="burgundy_cardigan.jpg">Burgundy</color_swatch>
</size>
</catalog_item>
<catalog_item gender="Womens">
<item_number>RRX9856</item_number>
<price>42.50</price>
<size description="Small">
<color_swatch image="red_cardigan.jpg">Red</color_swatch>
<color_swatch image="navy_cardigan.jpg">Navy</color_swatch>
<color_swatch image="burgundy_cardigan.jpg">Burgundy</color_swatch>
</size>
<size description="Medium">
<color_swatch image="red_cardigan.jpg">Red</color_swatch>
<color_swatch image="navy_cardigan.jpg">Navy</color_swatch>
<color_swatch image="burgundy_cardigan.jpg">Burgundy</color_swatch>
<color_swatch image="black_cardigan.jpg">Black</color_swatch>
</size>
<size description="Large">
<color_swatch image="navy_cardigan.jpg">Navy</color_swatch>
<color_swatch image="black_cardigan.jpg">Black</color_swatch>
</size>
<size description="Extra Large">
<color_swatch image="burgundy_cardigan.jpg">Burgundy</color_swatch>
<color_swatch image="black_cardigan.jpg">Black</color_swatch>
</size>
</catalog_item>
</product>
</catalog>',1234.0,'2017-03-07 20:15:04','N',6.0)
Select query on table, getting NULLs after first line fetching
"select * from support.clobtest_h"
Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-5.6.0-1.cdh5.6.0.p0.45/jars/hive-common-1.1.0-cdh5.6.0.jar!/hive-log4j.properties
OK
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.6.0-1.cdh5.6.0.p0.45/jars/parquet-pig-bundle-1.5.0-cdh5.6.0.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.6.0-1.cdh5.6.0.p0.45/jars/parquet-format-2.1.0-cdh5.6.0.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.6.0-1.cdh5.6.0.p0.45/jars/parquet-hadoop-bundle-1.5.0-cdh5.6.0.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.6.0-1.cdh5.6.0.p0.45/jars/hive-exec-1.1.0-cdh5.6.0.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.6.0-1.cdh5.6.0.p0.45/jars/hive-jdbc-1.1.0-cdh5.6.0-standalone.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [shaded.parquet.org.slf4j.helpers.NOPLoggerFactory]
2.0 Testing issue <?xml version="1.0"?> NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL 1234.0 2017-03-07 20:15:04 NULL NULL NULL NULL
Time taken: 1.878 seconds, Fetched: 42 row(s)
Mar 8, 2017 1:37:37 PM WARNING: parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Mar 8, 2017 1:37:37 PM INFO: parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 42 records.
Mar 8, 2017 1:37:37 PM INFO: parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Mar 8, 2017 1:37:37 PM INFO: parquet.hadoop.InternalParquetRecordReader: block read in memory in 21 ms. row count = 42
I have set the below property in hive-site.xml, which resolved the issue.
<property>
<name>hive.query.result.fileformat</name>
<value>SequenceFile</value>
</property>

How to fix this pivot query...?

I have this query:
with cte1 as (
select id,
row_number() over (partition by [Id] order by id) as row,
first_name + ' ' + last_name as [Contact Name]
from contacts
where is_company = 0 and is_active = 1
),
companyContacts as (
select * from cte1 where row < 6
)
select company.company_name,
c.[1] as contact_1,
c.[2] as contact_2,
c.[3] as contact_3,
c.[4] as contact_4,
c.[5] as contact_5
from contacts company
left join contact_company_relation_additional_information relation
on company.id = relation.company_id and relation.ContactCompanyRelation_IsActive = 1
left outer join
(select *
from companyContacts
pivot (min([Contact Name]) for row in ([1],[2],[3],[4],[5])) x
) c on c.id = relation.contact_id
where is_company = 1 and is_active = 1
order by company.company_name
That brings me the data this way:
company_name contact_1 contact_2 contact_3 contact_4 contact_5
Analist Ori Reshef NULL NULL NULL NULL
Analist Ben Gurion NULL NULL NULL NULL
Analist Ofer Jerus NULL NULL NULL NULL
Bar Net Maya Leshe NULL NULL NULL NULL
Bar Net Yossi Farc NULL NULL NULL NULL
Bar Net Dima Brods NULL NULL NULL NULL
Here for some reason the contacts are in different rows and only in column: "contact_1".
But I need to have only one row ro each company name, and have the contacts in 5 diferent columns. Like this:
company_name contact_1 contact_2 contact_3 contact_4 contact_5
Analist Ori Reshef Ben Gurion Ofer Jerus NULL NULL
Bar Net Maya Leshe Yossi Farc Dima Brods NULL NULL
Can someone tell me how can I fix this query to bring me the data as I need it...?
My table struct is:
table Contacts: [id], [is_company], [first_name], [last_name], [company_name]
table contact_company_relation: [id], [company_id], [contact_id]
Sample Data:
table contacts:
id is_company first_name last_name company_name
1 True NULL NULL Analist
2 True NULL NULL Bar Net
3 False Ori Reshef NULL
4 False Ben Gurion NULL
5 False Ofer Jerus NULL
6 False Maya Leshe NULL
7 False Yossi Farc NULL
8 False Dima Brods NULL
table contact_company_relation:
id company_id contact_id
1 1 3
2 1 4
3 1 5
4 2 6
5 2 7
6 2 8
The problem that you are having is with the following line:
row_number() over (partition by [Id] order by id) as row,
This is creating a unique number for each row in your contacts table but you are partitioning the table by the id column which appears to be a unique value already for each row.
You should be partitioning the data based on the number of rows that exist in the contact_company_relation table instead.
I also would alter the code to something like the following:
select company_name,
Contact1, Contact2, Contact3,
Contact4, Contact5
from
(
select c.first_name + ' ' + c.last_name as contact_name,
comp.company_name,
'Contact'+
cast(row_number() over(partition by ccr.company_id
order by ccr.contact_id) as varchar(1)) row
from contacts c
inner join contact_company_relation ccr
on c.id = ccr.contact_id
inner join contacts comp
on ccr.company_id = comp.id
) d
pivot
(
max(contact_name)
for row in (Contact1, Contact2, Contact3,
Contact4, Contact5)
) p;
See SQL Fiddle with Demo. This gives a result:
| COMPANY_NAME | CONTACT1 | CONTACT2 | CONTACT3 | CONTACT4 | CONTACT5 |
|--------------|------------|------------|------------|----------|----------|
| Analist | Ori Reshef | Ben Gurion | Ofer Jerus | (null) | (null) |
| Bar Net | Maya Leshe | Yossi Farc | Dima Brods | (null) | (null) |