BigQuery - BQ extract - Multiple empty file generation - google-bigquery

Im trying to export data from big query table to zip file in command line by using BQ extract. It generated multiple empty files (with header) , along with one file with correct data. Can someone please let me know , why empty files are generated.
Thanks

This is a BigQuery issue already reported. I suggest starring the issue and asking for an update on it.

I faced the same empty files issue when using EXPORT DATA.
After doing a bit of R&D found the solution. Put LIMIT xxx in your SELECT SQL and it will do the trick.
You can find the count, and put that as LIMIT value.
SELECT ....
FROM ...
WHERE ...
LIMIT xxx

Related

Trying to create a table and load data into same table using Databricks and SQL

I Googled for a solution to create a table, using Databticks and Azure SQL Server, and load data into this same table. I found some sample code online, which seems pretty straightforward, but apparently there is an issue somewhere. Here is my code.
CREATE TABLE MyTable
USING org.apache.spark.sql.jdbc
OPTIONS (
url "jdbc:sqlserver://server_name_here.database.windows.net:1433;database = db_name_here",
user "u_name",
password "p_wd",
dbtable "MyTable"
);
Now, here is my error.
Error in SQL statement: SQLServerException: Invalid object name 'MyTable'.
My password, unfortunately, has spaces in it. That could be the problem, perhaps, but I don't think so.
Basically, I would like to get this to recursively loop through files in a folder and sub-folders, and load data from files with a string pattern, like 'ABC*', and load recursively all these files into a table. The blocker, here, is that I need the file name loaded into a field as well. So, I want to load data from MANY files, into 4 fields of actual data, and 1 field that captures the file name. The only way I can distinguish the different data sets is with the file name. Is this possible? Or, is this an exercise in futility?
my suggestion is to use the Azure SQL Spark library, as also mentioned in documentation:
https://docs.databricks.com/spark/latest/data-sources/sql-databases-azure.html#connect-to-spark-using-this-library
The 'Bulk Copy' is what you want to use to have good performances. Just load your file into a DataFrame and bulk copy it to Azure SQL
https://docs.databricks.com/data/data-sources/sql-databases-azure.html#bulk-copy-to-azure-sql-database-or-sql-server
To read files from subfolders, answer is here:
How to import multiple csv files in a single load?
I finally, finally, finally got this working.
val myDFCsv = spark.read.format("csv")
.option("sep","|")
.option("inferSchema","true")
.option("header","false")
.load("mnt/rawdata/2019/01/01/client/ABC*.gz")
myDFCsv.show()
myDFCsv.count()
Thanks for a point in the right direction mauridb!!

Using Delete as header

I am using Text_JDBC40 jar and trying to fetch data from a csv file using sql query. In the csv file I have a header named as Delete so when I am trying to fetch data I am getting the below error.
Syntax error: Stopped parse at Delete
Renaming this column with some other name fetches the data properly. Any idea why this is happening? Also is there any other option for Text_JDBC40 jar?
If you are trying to import data into sql from csv then Delete quoted with Grave accent(`) while creating and retrieving the data. Please try this, may be helpful.

How to use Vertica's COPY LOCAL as an sql statement from MATLAB on Windows

I'm trying to insert around 80 million records created using MATLAB into Vertica Database table. I wanted to know if we can call COPY LOCAL statement in MATLAB as a regular sql statement using exec(conn, sql). For test purpose, I tried with a dat file having around 4 million records as following:
sqlstmnt = 'COPY schema.table_name (FK_CUSTOMER_ID,FK_RUN_START_DATE_ID,FK_RUN_END_DATE_ID,FK_TRAVEL_ID,FK_ORIGIN_ID,FK_DEST_ID,FK_SEGMENT_ID,SEGMENT_PERCENTAGE,LAST_UPDATED) FROM LOCAL ''/my/file/full/path/test1.dat''';
results = exec(conn,sqlstmnt);
But it gave an error in results.Message like:
[Vertica]JDBC A ResultSet was expected but not generated from query "COPY schema.table_name(FK_CUSTOMER_ID,FK_RUN_START_DATE_ID,FK_RUN_END_DATE_ID,FK_TRAVEL_ID,FK_ORIGIN_ID,FK_DEST_ID,FK_SEGMENT_ID,SEGMENT_PERCENTAGE,LAST_UPDATED) FROM LOCAL '/my/file/full/path/test1.dat'". Query not executed.
I have the data in the '.dat' file in the order in which the columns are mentioned in COPY LOCAL.
I could not find any helpful resource explaining this error.
I have this test1.dat file which I'm able to insert using COPY from vsql but since I run my codes in MATLAB with many iterations,each iteration producing about a million records, I would want to insert them during each iteration. Any help will be really great.
COPY command return ResultSet that includes the amount of loaded data , i see two main options
1) results =exec(conn,sqlstmnt);
2)results = runsqlscript(conn,'nameOfSQLScriptthatIncludeTheCopyCommand.sql')
I hope you will find it useful
Thanks
I just finish reviewing you’re your input sample data .
i see major problem with the mapping of the input csv to the target table .
Main issues are :
1) Lines are broken into 2 lines ( you should prefer having one sample per line and avoid brock it into 2 lines )
Eg : "1,20150101,0,2,2573,2714,1,8.147237e-01
50,48,49,54,45,48,51,-28 12:11:46"
2) when you define data types on vertica table ,eg: timestamp the data on the csv must reflect to it ( what you have is "-28 12:11:46" , this will not work )
After you fix all this issues , make sure you test it using vsql , then go and try it with matlab
I hope you will find it useful.

SQL syntax error when using Heroku dataclips to export PostgreSQL database into csv

I have a Rails app on Heroku that i'm currently testing to ensure that I can download the information it gathers. I've managed to get PostgreSQL 9.3.5 working and can even get it to spit out a public url to an unreadable dump file, but I want to export a particular table into a CSV that is easier to understand so that I can gather the data.
I've been looking into Heroku Dataclips. The documentation says that this is possible, but doesn't explain how. This site seemed to give some tips on SQL inputs:
http://www.gistutor.com/postgresqlpostgis/10-intermediate-postgresqlpostgis-tutorials/39-how-to-import-or-export-a-csv-file-using-postgresql-copy-to-and-copy-from-queries.html
So I entered this into Dataclips:
COPY participations(user_full_name, user_email, event_name, event_date_time)
TO '/usr/local/pgsql/data/csv/event_registrations.csv'
WITH DELIMITER ‘,’
CSV HEADER
However, I get this error:
Your query couldn't be created.
ERROR: syntax error at or near "COPY"
LINE 2: COPY participation(user_full_name, user_email, event_name, e...
^
How can I fix this? Maybe the reference i'm using is wrong, because I don't see the difference between what i'm doing and what's there.
FWIW, i'm using Cloud9 IDE as my terminal.
If you are trying to get data out in csv file then :
try to do this in command line and put "\" before copy
like this
\COPY participations(user_full_name, user_email, event_name, event_date_time)
TO '/usr/local/pgsql/data/csv/event_registrations.csv'
WITH DELIMITER ‘,’
CSV HEADER
or you can download PGadmin it has option execute query to file under QUERY tab on top
According to Heroku support, this is what you need to put in a Dataclip if you want to get all the records from a particular table:
SELECT * from table_name;
Once you create your Dataclip, you will have the option through the Dataclips interface to download the results as a CSV.

Google BigQuery - Error while downloading data to a table

I am trying to work with the github data which has been uploaded to Google's big data. I ran a few queries (which generated a lot of rows -
eg: a query SELECT actor_attributes_login, repository_watchers , repository_forks FROM [githubarchive:github.timeline]
where repository_watchers > 2 and REGEXP_MATCH(repository_created_at, '2012-')
ORDER BY actor_attributes_login;
The answer had more than 2,20,000 rows. When I attempted to download to CSV , it said
Download Unavailable
This result set contains too many rows for direct download. Please use "Save as Table" and then export the resulting table.
When I tried to do it as Save as Table I got the following error:
Access Denied: Job publicdata:job_c2338ba91e494b21970854e13cdc4b2a: RUN_JOB
Also, I ran queries where I limited the number of rows to 200 or so, even in such cases I got the error as mentioned above. However I was able to download it as CSV.
Any solution to this problem?
#Anerudh You don't have access to modify the publicdata samples dataset. Create a brand new dataset, and try to save your query results to a new table in that dataset.