Google BigQuery SQL, selecting a range of records from a large dataset - sql

I am querying the GDELT 2.0 database and am tried to export the entire query but it is too big. I did a limit for 10000 and got those and I now want to get the next 10000 records and so on until I collect the entire query. This is the original command
SELECT * FROM [gdelt-bq:gdeltv2.eventmentions] WHERE MentionIdentifier LIKE '%bitcoin%'
I tried doing what was explained in this link: http://www.plus2net.com/sql_tutorial/sql_limit.php
However it does not go through I'm getting an EOF error and putting in a semicolon doesn't fix it. If you know how to do this please help.

Related

Getting error while querying to get tables partition metadata in bigquery

BigQuery project has one dataset and around 1005 tables.I am running query to get partition metadata for tables.
Query is
SELECT count(*) FROM bq-tf-test-500-298.unravelFr8ks4.INFORMATION_SCHEMA.PARTITIONS;
Following error getting INFORMATION_SCHEMA.PARTITIONS query attempted to read too many tables. Please add more restrictive filters
Attaching
Query screenshot
Partitions view is currently at Preview state at the moment. It is possible that the tables it can read are limited. As the message suggests, a workaround of adding more filters could run your query.

BigQuery - BQ extract - Multiple empty file generation

Im trying to export data from big query table to zip file in command line by using BQ extract. It generated multiple empty files (with header) , along with one file with correct data. Can someone please let me know , why empty files are generated.
Thanks
This is a BigQuery issue already reported. I suggest starring the issue and asking for an update on it.
I faced the same empty files issue when using EXPORT DATA.
After doing a bit of R&D found the solution. Put LIMIT xxx in your SELECT SQL and it will do the trick.
You can find the count, and put that as LIMIT value.
SELECT ....
FROM ...
WHERE ...
LIMIT xxx

How to use Vertica's COPY LOCAL as an sql statement from MATLAB on Windows

I'm trying to insert around 80 million records created using MATLAB into Vertica Database table. I wanted to know if we can call COPY LOCAL statement in MATLAB as a regular sql statement using exec(conn, sql). For test purpose, I tried with a dat file having around 4 million records as following:
sqlstmnt = 'COPY schema.table_name (FK_CUSTOMER_ID,FK_RUN_START_DATE_ID,FK_RUN_END_DATE_ID,FK_TRAVEL_ID,FK_ORIGIN_ID,FK_DEST_ID,FK_SEGMENT_ID,SEGMENT_PERCENTAGE,LAST_UPDATED) FROM LOCAL ''/my/file/full/path/test1.dat''';
results = exec(conn,sqlstmnt);
But it gave an error in results.Message like:
[Vertica]JDBC A ResultSet was expected but not generated from query "COPY schema.table_name(FK_CUSTOMER_ID,FK_RUN_START_DATE_ID,FK_RUN_END_DATE_ID,FK_TRAVEL_ID,FK_ORIGIN_ID,FK_DEST_ID,FK_SEGMENT_ID,SEGMENT_PERCENTAGE,LAST_UPDATED) FROM LOCAL '/my/file/full/path/test1.dat'". Query not executed.
I have the data in the '.dat' file in the order in which the columns are mentioned in COPY LOCAL.
I could not find any helpful resource explaining this error.
I have this test1.dat file which I'm able to insert using COPY from vsql but since I run my codes in MATLAB with many iterations,each iteration producing about a million records, I would want to insert them during each iteration. Any help will be really great.
COPY command return ResultSet that includes the amount of loaded data , i see two main options
1) results =exec(conn,sqlstmnt);
2)results = runsqlscript(conn,'nameOfSQLScriptthatIncludeTheCopyCommand.sql')
I hope you will find it useful
Thanks
I just finish reviewing you’re your input sample data .
i see major problem with the mapping of the input csv to the target table .
Main issues are :
1) Lines are broken into 2 lines ( you should prefer having one sample per line and avoid brock it into 2 lines )
Eg : "1,20150101,0,2,2573,2714,1,8.147237e-01
50,48,49,54,45,48,51,-28 12:11:46"
2) when you define data types on vertica table ,eg: timestamp the data on the csv must reflect to it ( what you have is "-28 12:11:46" , this will not work )
After you fix all this issues , make sure you test it using vsql , then go and try it with matlab
I hope you will find it useful.

Hive CSV Import Limit

I have a large csv file with about 3.3 million rows that I have uploaded to Hive metastore and created a table from.
However when I run a
select count(*) from table
query on it, it only shows about 1.7 million rows.
I've run a
select * from table
query and downloaded the results as a csv, the file only has about 1.7 million rows in it.
Is there a size limit on a csv file that you can import into hive and create a table from?
Any tips greatly appreciated.
I would suggest to check your file again, the scenario you are saying may occur for many conditions:
1.) You don't have that many records in file.
2.) Some of your rows are not separated by new line, that means records are getting merged. Thats why you are getting less records.
Hope this helps...!!!

Google BigQuery - Error while downloading data to a table

I am trying to work with the github data which has been uploaded to Google's big data. I ran a few queries (which generated a lot of rows -
eg: a query SELECT actor_attributes_login, repository_watchers , repository_forks FROM [githubarchive:github.timeline]
where repository_watchers > 2 and REGEXP_MATCH(repository_created_at, '2012-')
ORDER BY actor_attributes_login;
The answer had more than 2,20,000 rows. When I attempted to download to CSV , it said
Download Unavailable
This result set contains too many rows for direct download. Please use "Save as Table" and then export the resulting table.
When I tried to do it as Save as Table I got the following error:
Access Denied: Job publicdata:job_c2338ba91e494b21970854e13cdc4b2a: RUN_JOB
Also, I ran queries where I limited the number of rows to 200 or so, even in such cases I got the error as mentioned above. However I was able to download it as CSV.
Any solution to this problem?
#Anerudh You don't have access to modify the publicdata samples dataset. Create a brand new dataset, and try to save your query results to a new table in that dataset.