Multiple directories with multiple files in U-SQL without date time

Multiple directories with multiple files in U-SQL without date time - azure-data-lake

i want to read multiple files from multiple folder using U-SQL without date time.
folder structure is
input
input1
file1.csv
file2.csv
input2
file3.csv

You can try this:
#"input/input{*}/file{*}.csv"

Jamil's answer will work. If you want more control, you can simply list the files you want as a comma separated list, eg
#data =
EXTRACT col1 int,
col2 int,
col3 int,
col4 int
FROM "input/input1/file1.csv",
"input/input1/file2.csv",
"input/input2/file3.csv"
USING Extractors.Csv();

Related

Decimal input get rounded in create external table in Hive from CSV

I am creating a table in Hive from a CSV (comma separated) that I have in HDFS. I have three columns - 2 strings and a decimal one (with at max 18 values after the decimal dot and one before). Below, what I do:
CREATE EXTERNAL TABLE IF NOT EXISTS my_table(
col1 STRING, col2 STRING, col_decimal DECIMAL(19,18))
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/hdfs_path/folder/';
My col_decimal is being rounded to 0 when below 1 and to 1 when it's not actually decimal (I only have numbers between 0 and 1).
Any idea of what is wrong? By the way, I have also tried DECIMAL(38,37)
Thank you.

How to select all values that a not a number using SQL?

Due to an earlier error, I ended up with letters and symbols in places where I should have had integers and floars. At this time, I don't know the extent of the problem and working to correct the code as we move forward.
As of right now when I run SELECT distinct col1 from table; I get integers, floats, symbols and letters. A few million of them.
How can I update the SQL to exclude all numbers? In other words, show me only letters and symbols.

You can use the GLOB operator:
select col1
from tablename
where col1 GLOB '*[^0-9]*'
This will return all values of col1 that contain any character different than a digit.
You may change it to include '.' in the list of chars:
where col1 GLOB '*[^0-9.]*'
See the demo.
If what you want is values that do not contain any digits then use this:
select col1
from tablename
where col1 not GLOB '*[0-9]*'
See the demo.

Hmmm . . . SQLite doesn't have regular expressions built-in, making this a bit of a pain. If the column actually contains numbers and strings (because that is possible in SQLite), you can use
where typeof(col) = 'text'
If the types are all text (so '1.23' rather than 1.23), then this may do what you want:
where cast( (col + 0) as text) = col

Need to separate the string from a column to multiple columns which is separated by ';' delimeter in bigquery

I have the below scenario
My column values looked like below :
loc=adhesion;pos=31;refresh=559;tag=post;tag=article;tag=business;
I want to separate all the values on the basis of ';' delimiter.
Please suggest the code to generate the below result in bigquery.
Example : My output should look like below :
Number of columns should get created with the below values :
col1 : loc=adhesion
Col2 : pos=31
col3 : refresh=559
col4 : tag=post
and so on
Thank you for your help.

You can do this using the SPLIT and NTH functions. See the link below for an example of how to use it.
BigQuery: SPLIT() returns only one value

import csv file in h2 database in several column

In my CSV file there is:
prenom; nom; age
prenom1; nom1; age1
prenom2; nom2; age2
...
When I import my CSV file using this command:
CREATE TABLE TEST AS SELECT * FROM CSVREAD('C:\Users\anonymous\Desktop\test.csv');
The main problem is that my database has 1 column with my CSV file..
I would like 3 columsn with prenom, nom and age with the data in each column.
Thanks for your help!

As #jdv stated, you must specify the field separator if it is not the default ,. The null specifies that the column names will be parsed from the first row.
CREATE TABLE TEST AS SELECT * FROM CSVREAD('C:\Users\anonymous\Desktop\test.csv',null,'fieldSeparator=;');
Keep in mind you may have to specify charset=Cp1252 as well, if the CSV file was generated with Excel. If you see something like prÃ©nom you have the wrong encoding.

Import Data Into Hive Containing Whitespace

I am importing data from a csv file into Hive. My table contains both strings and ints. However, in my input file, the ints have whitespace around them, so it kind of looks like this:
some string, 2 ,another string , 7 , yet another string
Unfortunately I cannot control the formatting of the program providing the file.
When I import the data using (e.g.):
CREATE TABLE MYTABLE(string1 STRING, alpha INT, string2 STRING, beta INT, string3 STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
Then all my integers get set to NULL. I am assuming this is because the extra whitespace makes the parsing fail. Is there a way around this?

You can perform a multi-stage import. In the first stage, save all of your data as STRING and in the second stage use trim() to remove whitespace and then save the data as INT. You could also look into using Pig to read the data from your source files as raw text and then write it to Hive as with the correct data types.
Edit
You can also do this in one pass if you can point to your source file as an external table.
CREATE TABLE myTable(
string1 STRING, alpha STRING, string2 STRING, beta STRING, string3 STRING
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION '\\server\path\file.csv'
INSERT INTO myOtherTable
SELECT string1,
CAST(TRIM(alpha) AS INT),
string2,
CAST(TRIM(beta) AS INT),
string3
FROM myTable;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Multiple directories with multiple files in U-SQL without date time - azure-data-lake

i want to read multiple files from multiple folder using U-SQL without date time. folder structure is input input1 file1.csv file2.csv input2 file3.csv

You can try this: #"input/input{}/file{}.csv"

Jamil's answer will work. If you want more control, you can simply list the files you want as a comma separated list, eg #data = EXTRACT col1 int, col2 int, col3 int, col4 int FROM "input/input1/file1.csv", "input/input1/file2.csv", "input/input2/file3.csv" USING Extractors.Csv();

Related

Decimal input get rounded in create external table in Hive from CSV

How to select all values that a not a number using SQL?

Need to separate the string from a column to multiple columns which is separated by ';' delimeter in bigquery

import csv file in h2 database in several column

Import Data Into Hive Containing Whitespace

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Multiple directories with multiple files in U-SQL without date time - azure-data-lake

i want to read multiple files from multiple folder using U-SQL without date time. folder structure is input input1 file1.csv file2.csv input2 file3.csv

You can try this: #"input/input{*}/file{*}.csv"

Jamil's answer will work. If you want more control, you can simply list the files you want as a comma separated list, eg #data = EXTRACT col1 int, col2 int, col3 int, col4 int FROM "input/input1/file1.csv", "input/input1/file2.csv", "input/input2/file3.csv" USING Extractors.Csv();

Related

Decimal input get rounded in create external table in Hive from CSV

How to select all values that a not a number using SQL?

Need to separate the string from a column to multiple columns which is separated by ';' delimeter in bigquery

import csv file in h2 database in several column

Import Data Into Hive Containing Whitespace

Categories

Resources

You can try this: #"input/input{}/file{}.csv"