Downloading data from BigQuery with Stata - google-bigquery

How can I download data (either with queries or the full table) from BigQuery with Stata?
On Stata, I would like to run something like:
download "SELECT * FROM `project.dataset.table`" "~/Downloads/table.csv"
and have it download the query result to the local csv file.

You should be able to use odbc to do this. For example, your code might be:
odbc load, exec('"SELECT * FROM project.dataset.table"') dsn("ds_name") clear
export delimited "~/Downloads/table.csv", replace
But this depends on BigQuery being accessible to odbc. You may need to link it. If so, you may find this presentation and this documentation helpful.

Related

SQL Server - Copying data between tables where the Servers cannot be connected

We want some of our customers to be able to export some data into a file and then we have a job that imports that into a blank copy of a database at our location. Note: a DBA would not be involved. This would be a function within our application.
We can ignore table schema differences - they will match. We have different tables to deal with.
So on the customer side the function would ran somethiug like:
insert into myspecialstoragetable select * from source_table
insert into myspecialstoragetable select * from source_table_2
insert into myspecialstoragetable select * from source_table_3
I then run a select * from myspecialstoragetable and get a .sql file they can then ship to me which we can then use some job/sql script to import into our copy of the db.
I'm thinking we can use XML somehow, but I'm a little lost.
Thanks
Have you looked at the bulk copy utility bcp? You can wrap it with your own program to make it easier for less sophisticated users.
Since it is a function within your application, in what language is the application front-end written ? If it is .NET, you can use Data Transformation Services in SQL Server to do a sample export. In the last step, you could save the steps into a VB/.NET module. If necessary, modify this file to change table names etc. Integrate this DTS module into your application. While doing the sample export, export it to a suitable format such as .CSV, .Excel etc, whichever format from which you will be able to import into a blank database.
Every time the user wants do an export, he will have to click on a button that would invoke the DTS module integrated into your application, that will dump the data to the desired format. He can mail such file to you.
If your application is not written in .NET, in whichever language it is written, it will have options to read data from SQL Server and dump them to a .CSV or text file with delimiters. If it is a primitive language, you may have to do it by concatenating the fields of every record, by looping through the records and writing to a file.
XML would be too far-fetched for this, though it's not impossible. At your end, you should have the ability to parse the XML file and import it into your location. Also, XML is not really suited if the no. of records are too large.
You probably think of a .sql file, as in MySql. In SQL Server, .sql files, that are generated by the 'Generate Scripts' function of SQL Server's interface, are used for table structures/DDL rather than the generation of the insert statements for each of the record's hard values.

How to transfer SQL query results to a new csv file with headers?

I want to transfer SQL query results to a new csv file. This is because I have placed my SQL query inside a loop which will generate export query results to csv file each time. I'm using MS SQL Server 2012. I don't want to take GUI option.
Sql Server is not really designed to import and export files. You can use bulk copy program but I dont think it works in tsql code (looping). You can use openrowset but you need to set a special flag that opens up your surface area of attack which some do not want to do.
The answer is SSIS (or a tool like Talend). It comes with Sql and is designed by MS as the go to tool for import and export from Sql. If you were to right click on the data base, choose tasks and then export the wizard eventually creates and executes an SSIS package.
I recommend you reconsider a GUI option.
ps - Another answer was to use save results as. I have heard of problems using this method including problems with delimiters or text qualified fields.
There are multiple ways to attain this. Either you can export the resultset using BCP or using IMPORT/ EXPORT or using CTRL+SHIFT+S (this will change the resultset to SAVE AS. Hope this may help.

copy blob data into on-premise sql table

My problem statement is that I have a csv blob and I need to import that blob into a sql table. Is there an utility to do that?
I was thinking of one approach, that first to copy blob to on-premise sql server using AzCopy utility and then import that file in sql table using bcp utility. Is this the right approach? and I am looking for 1-step solution to copy blob to sql table.
Regarding your question about the availability of a utility which will import data from blob storage to a SQL Server, AFAIK there's none. You would need to write one.
Your approach seems OK to me. Though you may want to write a batch file or something like that to automate the whole process. In this batch file, you would first download the file on your computer and the run the BCP utility to import the CSV in SQL Server. Other alternatives to writing batch file are:
Do this thing completely in PowerShell.
Write some C# code which makes use of storage client library to download the blob and once the blob is downloaded, start the BCP process in your code.
To pull a blob file into an Azure SQL Server, you can use this example syntax (this actually works, I use it):
BULK INSERT MyTable
FROM 'container/folder/folder/file'
WITH ( DATA_SOURCE = 'ds_blob',BATCHSIZE=10000,FIRSTROW=2);
MyTable has to have identical columns (or it can be a view against a table that yields identical columns)
In this example, ds_blob is an external data source which needs to be created beforehand (https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql)
The external data source needs to use a database contained credential, which uses an SAS key which you need to generate beforehand from blob storage https://learn.microsoft.com/en-us/sql/t-sql/statements/create-database-scoped-credential-transact-sql)
The only downside to this mehod is that you have to know the filename beforehand - there's no way to enumerate them from inside SQL Server.
I get around this by running powershell inside Azure Automation that enumerates blobds and writes them into a queue table beforehand

Generate DDL SQL create table statement after scanning CSV file

Are there any command line tools (Linux, Mac, and/or Windows) that I could use to scan a delimited file and output a DDL create table statement with the data type determined for me?
Did some googling, but couldn't find anything. Was wondering if others might know, thanks!
DDL-generator can do this. It can generate DDL's for YAML, JSON, CSV, Pickle and HTML (although I don't know how the last one works). I just tried it on some data exported from Salesforce and it worked pretty well. Note you need to use it with Python 3, I could not get it to work with Python 2.7.
You can also try https://github.com/mshanu/idli. It can take csv file as input and can generate create statement with appropriate types.It can generate for mysql, oracle and postgres. I am actively working on this and happy to receive feedback for future improvement

Querying (SQL) Oracle/Toad dump files without importing them

We're doing a monthly data dump of our databases using Toad for Oracle's Export function. We've got some SQL queries to create statistics about the data. I'd like to compare the results of the current state with the last few dumps.
I can open the files with the Export File Browser in Toad (v11) and sort/filter the data using the GUI, but that's not powerful enough. Is there a way to query the dump files with SQL without having to take extra steps like creating a new schema and importing it?
By far the best way would be to reimport the data.