Create external table from csv on HDFS , all values come with quotes - sql

I have a csv file on HDFS and I am trying to create an impala table , the situation is it created the table and values with all the "
CREATE external TABLE abc.def
(
name STRING,
title STRING,
last STRING,
pno STRING
)
row format delimited fields terminated by ','
location 'hdfs:pathlocation'
tblproperties ("skip.header.line.count"="1") ;
The output is
name tile last pno
"abc" "mr" "xyz" "1234"
"rew" "ms" "pre" "654"
I just want to create table from csv file without quotes. Please guide where I am going wrong.
Regards,
R

A way to do that is creating a stage table that load the file with quotes and then with CTAS (Create table as select) create the right table cleaning the fields with replace function.
As an example
CREATE TABLE quote_stage(
id STRING,
name STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
+-----+----------+
| id | name |
+-----+----------+
| "1" | "pepe" |
| "2" | "ana" |
| "3" | "maria" |
| "4" | "ramon" |
| "5" | "lucia" |
| "6" | "carmen" |
| "7" | "alicia" |
| "8" | "pedro" |
+-----+----------+
CREATE TABLE t_quote
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE
AS SELECT replace(id,'"','') AS id, replace(name,'"','') AS name FROM quote_stage;
+----+--------+
| id | name |
+----+--------+
| 1 | pepe |
| 2 | ana |
| 3 | maria |
| 4 | ramon |
| 5 | lucia |
| 6 | carmen |
| 7 | alicia |
| 8 | pedro |
+----+--------+
Hope this helps.

Related

How can I convert varchar (1,2,3) to a correlating column values name in SQL Server 2014?

I have an issue with converting a varchar column filled with id's (foreign keys) to another string with names, these names are linked in another table with their correlating id.
Data
x-----x------------------------x
| Id | foreign Keys (varchar) |
x-----x------------------------x
| 1 | 1,2,3 |
| 2 | 2,3,4 |
| 3 | 4 |
x-----x------------------------x
Names
x-----x-----------------x
| Id | Names (varchar)|
x-----x-----------------x
| 1 | Rick |
| 2 | Steven |
| 3 | Charly |
| 4 | Tom |
x-----x-----------------x
Basically I need the values in the table data to UPDATE to a varchar like 'Rick, Steven, Charly'.
I am working in SQL Server 2014, so I can't use the function STRING_SPLIT.
Help would be really appreciated
Thanks

Snowflake Create View with JSON (VARIANT) field as columns with dynamic keys

I am having a problem creating VIEWS with Snowflake that has VARIANT field which stores JSON data whose keys are dynamic and keys definition is stored in another table. So I want to create a VIEW that has dynamic columns based on the foreign key.
Here are my table looks like:
companies:
| id | name |
| -- | ---- |
| 1 | Company 1 |
| 2 | Company 2 |
invoices:
| id | invoice_number | custom_fields | company_id |
| -- | -------------- | ------------- | ---------- |
| 1 | INV-01 | {"1": "Joe", "3": true, "5": "2020-12-12"} | 1 |
| 2 | INV-01 | {"2":"Hello", "4": 1000} | 2 |
customization_fields:
| id | label | data_type | company_id |
| -- | ----- | --------- | ---------- |
| 1 | manager | text | 1 |
| 2 | reference | text | 2 |
| 3 | emailed | boolean | 1 |
| 4 | account | integer | 2 |
| 5 | due_date | date | 1 |
So I want to create a view for getting each companies invoices something like:
CREATE OR REPLACE VIEW companies_invoices AS SELECT * FROM invoices WHERE company_id = 1
which should get a result like below:
| id | invoice_number | company_id | manager | emailed | due_date |
| -- | -------------- | ---------- | ------- | ------- | -------- |
| 1 | INV-01 | 1 | Joe | true | 2020-12-12 |
So my challenge above here is I cannot make sure the keys when I write the query. If I know that I could write
SELECT
id,
invoice_number,
company_id,
custom_fields:"1" AS manager,
custom_fields:"3" AS emailed,
custom_fields:"5" AS due_date
FROM invoices
WHERE company_id = 1
These keys and labels are written in the customization_fields table, so I tried different ways and I am not able to do that.
So could anyone tell me if we can do or not? If we can please give me an example so it would really help.
You cannot do what you want to do with a view. A view has a fixed set of columns and they have specific types. Retrieving a dynamic set of columns requires some other mechanism.
If you're trying to change the number of columns or the names of the columns based on the rows in the customization_fields table, you can't do it in a view.
If you have a defined schema and just need to grab dynamic JSON properties, you may want to consider looking into Snowflake's GET function. It allows you to get any part of a JSON using a string for the path rather than using a literal path in the SQL statement. For example:
create temp table foo(v variant);
insert into foo select parse_json('{ "name":"John", "age":30, "car":null }');
-- This uses a literal path in the SQL to get to a JSON property
select v:name::string as first_name from foo;
-- This uses the GET function to get the value from a path in a string
select get(v, 'name')::string as first_name from foo;
You can replace the 'name' in the second parameter of the GET function with the value stored in the customization_fields table.
In SF, You will have to use a Stored Proc function to retrieve the dynamic set of columns

Replacing multiple strings from a databsae column with distinct replacements

I have a hive table as below:
+----+---------------+-------------+
| id | name | partnership |
+----+---------------+-------------+
| 1 | sachin sourav | first |
| 2 | sachin sehwag | first |
| 3 | sourav sehwag | first |
| 4 | sachin_sourav | first |
+----+---------------+-------------+
In this table I need to replace strings such as "sachin" with "ST" and "Sourav" with "SG". I am using following query, but it is not solving the purpose.
Query:
select
*,
case
when name regexp('\\bsachin\\b')
then regexp_replace(name,'sachin','ST')
when name regexp('\\bsourav\\b')
then regexp_replace(name,'sourav','SG')
else name
end as newName
from sample1;
Result:
+----+---------------+-------------+---------------+
| id | name | partnership | newname |
+----+---------------+-------------+---------------+
| 4 | sachin_sourav | first | sachin_sourav |
| 3 | sourav sehwag | first | SG sehwag |
| 2 | sachin sehwag | first | ST sehwag |
| 1 | sachin sourav | first | ST sourav |
+----+---------------+-------------+---------------+
Problem: My intention is, when id = 1, the newName column should bring value as "ST SG". I mean it should replace both strings.
You can nest the replaces:
select s.*,
replace(replace(s.name, 'sachin', 'ST'), 'sourav', 'SG') as newName
from sample1 s;
You don't need regular expressions, so just use replace().

hive - show table's column details only

I have created a HIVE partition table and when I run describe table I see other table properties as well as the table column details. If I want to see only the table column details, then what command can I use?
create table t1 (x int, y int, s string) partitioned by (z date) stored as sequencefile;
describe t1;
+--------------------------+-----------------------+-----------------------+--+
| col_name | data_type | comment |
+--------------------------+-----------------------+-----------------------+--+
| x | int | |
| y | int | |
| s | string | |
| z | date | |
| | NULL | NULL |
| # Partition Information | NULL | NULL |
| # col_name | data_type | comment |
| | NULL | NULL |
| z | date | |
+--------------------------+-----------------------+-----------------------+--+
Can the last 5 rows be avoided?
| NULL | NULL |
| # Partition Information | NULL | NULL |
| # col_name | data_type | comment |
| | NULL | NULL |
| z | date | |
Also what does this NULL | NULL row means?
What you're looking for is this configuration parameter:
set hive.display.partition.cols.separately=false
From hive documentation:
In Hive 0.10.0 and earlier, no distinction is made between partition columns and non-partition columns while displaying columns for DESCRIBE TABLE. From Hive 0.12.0 onwards, they are displayed separately.
In Hive 0.13.0 and later, the configuration parameter hive.display.partition.cols.separately lets you use the old behavior, if desired (HIVE-6689). For an example, see the test case in the patch for HIVE-6689.

Excel VBA to transpose set of rows if value exists in another column

I'm trying to find a way via VB script that will transpose rows from column A into a new sheet but only if there is a value in column B for rows that contain numbers. I have a sheet with ~75K rows on it that I need to do this for, and I tried creating pivot tables which allowed me to get the data into its current format but I need the data to be in columns.
The tricky part of this is that in column A, I only need to look at the rows that are all numbers and not the other rows that have text.
I created a sample sheet to view, where the sample data is in the SOURCE tab and what I want the data to look like in the TRANSPOSED tab.
https://docs.google.com/spreadsheets/d/1ujbaouZFqiPw0DbO78PCnz25OY2ugF1HtUqMg_J7KeI/edit?usp=sharing
Any help would be appreciated.
UPDATE and Answer:
I modified my approach and went back to the original source data which was not part of a pivot table and was able to use a simple match formula between the 2 data sources. So, my original data looked like this:
+----------------+---------+--------+--------------+
| Gtin | Brand | Name | TaxonomyText |
+----------------+---------+--------+--------------+
| 00030085075605 | brand 1 | name 1 | cat1 |
| 00041100015112 | brand 2 | name 2 | cat2 |
| 00041100015099 | brand 3 | name 3 | cat3 |
| 00030085075608 | brand 4 | name 4 | cat4 |
+----------------+---------+--------+--------------+
I had another sheet containing the data I needed to match to in this format:
+----------------+---------+
| Gtin | Brand |
+----------------+---------+
| 00030085075605 | brand 1 |
| 00041100015112 | brand 2 |
| 00041100015098 | brand 3 |
| 00030085075608 | brand 4 |
+----------------+---------+
I created a new column in my source sheet and used a if error match formula:
=IFERROR(IF(MATCH(A14,data_to_match!$A:$A,0),"yes",),"no")
Then copied this formula down for every row, about 75K rows which very quickly added a yes or a no.
+----------------+---------+---------+--------+--------------+
| Gtin | matched | Brand | Name | TaxonomyText |
+----------------+---------+---------+--------+--------------+
| 00030085075605 | yes | brand 1 | name 1 | cat1 |
| 00041100015112 | yes | brand 2 | name 2 | cat2 |
| 00041100015098 | no | brand 3 | name 3 | cat3 |
| 00030085075608 | yes | brand 4 | name 4 | cat4 |
+----------------+---------+---------+--------+--------------+
The final step was to just filter for Yes values and I had all the data that I needed.
My mistake was going to a pivot table first which put the data in a very funky format causing me to have to do a transpose, which wasn't really necessary. Hopefully this can help others....