BigQuery mismatch columns - google-bigquery

I have multiple in .CSV files in Google cloud storage but I am unable to push data into table because number of columns in .CSV files are different.
Let's say I have 10 .CSV files with columns A, B, C, D, and I have another 10 .CSV files with columns A, B, D.
When I push .CSV files into Big Query table, I want to create to A column to A , B column To B, C column to C and making NULL for other 10 .CSV, D column to D .
Lets Suppose 10 .CSV files have column :
NAME, DOB, GENDER, MOBILE NO.
Dan , 12/08/1999, MALE, 1234567889
Oliver, 17/03/19998, MALE, 5267382736
Another 10 .CSV file have column :
NAME, DOB, MOBILE NO.
Akash, 12/02/1999, 1234567889
Ram, 17/09/19998, 5267382736
But In Big Query Table Final result i want:
NAME, DOB, GENDER, MOBILE NO.
Dan , 12/08/1999, MALE, 1234567889
Oliver, 17/03/19998, MALE, 5267382736
Akash, 12/02/1999, 1234567889
Ram, 17/09/19998, 5267382736
Can anyone help me with this?

Related

How to create view for my two tables in HANA

I have two tables in HANA, that is, A and B.
Data are inserted into A in batch mode, and B is used is to log the current batch that is ready to be used in A(When insert into A is done, will log its batch in B).
Sample data are:
A
name score batch_id
Tom 80 1
Jack 30 1
Alex 90 1
Tom 90 2
Jack 50 2
Alex 70 2
Tom 70 3
Jack 60 3
Alex 80 3
B
table_name batch_id
A 1
A 2
A 3
I have the following sql to get the latest batch from A that is ready be used:
select * from A where a.batch_id = (select max(batch_id) from B where table_name = 'A')
I want to model the above sql with attribute view, analytic view or compute view, but there are no measures defined in the sql, it looks that it can't be created as analytic view or compute view.
I would ask what type of view I should created, and how to model it, thanks!
you can use rank node to data source of table B, partition by column Table_Name, Descending Order , threshold value 1, this will automatically give you max value ; put filter on Data source Node on column Table_name as A,
Then use the output of Rank node to Join with data source of Table A,
You should use CALC View with at least one Key Figure;
Alternatively you can use Table Function to write Sql Cod ein it but bottom line is you should have key Figure

Separate 1 row data to multiple rows

I currently have a table that contains multiple rows and columns like bellow (the data in the row is somewhat grouped)
ID Column1_1 Column1_2 Column1_3 Column1_4 Column2_1 Column2_2 Column2_3 etc.
1 data11 data12 data13 etc.
2 data21 data22 etc.
3 data31 etc.
I need this table to be exported in an excel that looks like this:
ID Column DATA
1 Column1_1 data11
1 Column1_2 data12
1 Column1_3 data13
1 Column1_4 data14
2 Column2_1 data21
2 Column2_2 data22
I was thinking to export it in a big excel first and then create multiple sheets. But can I use group by in excel?
You can use UNPIVOT
SELECT
ID, ColumnName, DataValue
FROM
YourTable
UNPIVOT
(
DataValue FOR ColumnName IN (Column1_1, Column1_2, Column1_3, Column1_4, Column2_1, Column2_2, Column2_3.........)
)
AS unpvt;
https://learn.microsoft.com/en-us/sql/t-sql/queries/from-using-pivot-and-unpivot
You are normalizing the data and you have a field just for it called Row Normalizer. (Official doc is here).
Admittedly, the first few times you pull your hair with an outdated vocabulary, just because Microsoft re-invented the wheel by calling this operation a 'pivot'.
In your case the Typefield is "Column", the Fieldname are the "Column1_1"s or whatever the column are named in your input, the Type is "Column1_1" or whatever you want to write in the column named Column on your output, and New field is "DATA" or whatever you want to name the output denormalized column.
Of course, there is a bottom button to help you, however, as it seams to be your first time, I suggest you try manually with two or tree column before.

sql query to fetch only those records where sum(colNm)<xyz and store first and last records rowid/pk

I have a table with millions of records which holds information about a user, his or her documents in a BLOB, and a column holding the file size per row.
While reporting I need to extract all these records along with their attachments and store them in a folder. However, the constraint is that the folder size should not exceed 4GB.
What I need, is to fetch records only till that record, where the summation of file sizes is less than 4GB. I have hardly any experience in databases, and do not have any DB expert to refer.
for eg - say i need to fetch only records till sum(fileSize) < 9
Name fileSize
A 1
B 2
C 3
D 2
E 9
F 4
My query needs to return records A,B,C and D.
Also, i need to store the rowID/uniqueID of the first and last record for another subsequent process.
The DB being used is IBM DB2.
Thanks!
So here is some trick how you can find your file size. and in procedure you can manage with data.
select length(file_data) from files
where length(file_data)<99999999;
LENGTH(FILE_DATA)
82944
82944
91136
3 rows selected.
select dbms_lob.getlength(file_data) from files
where length(file_data)<89999;
DBMS_LOB.GETLENGTH(FILE_DATA)
82944
82944
2 rows selected.
dbms_lob.getlength() vs. length() to find blob size in oracle
hope this helps....

Designing SQL DB

I am creating a Database which should have these datas:
Table Name: PlayerNames
Columns: FirstName, LastName, Rank
Now I have another data which shows the HeadToHead count of winning: For eg.
A-A = Null
A-B = 5 (That means A against B and A won)
A-C = 3 (A against C, and A won)
B-A = 3 (That means B against A, and B won. I dont know A-B and B-A are not same).
B-B = Null
B-C = 4
..... and so on. I have to create a DB for this in which I can pull these records and viewers can select a player and select another player and see their opponent stats. How should DB be designed?
Any help?
Thanks!
PLAYER
id,
firstname,
lastname,
rank,
GAME
winning_player_id,
losing_player_id,
score,
maybe?
Here is another ERM:
It seems to make sense, at least for me. But I think it will be too complicated when you start to write the update/insert queries

mysql query to dynamically convert row data to columns

I am working on a pivot table query.
The schema is as follows
Sno, Name, District
The same name may appear in many districts eg take the sample data for example
1 Mike CA
2 Mike CA
3 Proctor JB
4 Luke MN
5 Luke MN
6 Mike CA
7 Mike LP
8 Proctor MN
9 Proctor JB
10 Proctor MN
11 Luke MN
As you see i have a set of 4 distinct districts (CA, JB, MN, LP). Now i wanted to get the pivot table generated for it by mapping the name against districts
Name CA JB MN LP
Mike 3 0 0 1
Proctor 0 2 2 0
Luke 0 0 3 0
i wrote the following query for this
select name,sum(if(District="CA",1,0)) as "CA",sum(if(District="JB",1,0)) as "JB",sum(if(District="MN",1,0)) as "MN",sum(if(District="LP",1,0)) as "LP" from district_details group by name
However there is a possibility that the districts may increase, in that case i will have to manually edit the query again and add the new district to it.
I want to know if there is a query which can dynamically take the names of distinct districts and run the above query. I know i can do it with a procedure and generating the script on the fly, is there any other method too?
I ask so because the output of the query "select distinct(districts) from district_details" will return me a single column having district name on each row, which i will like to be transposed to the column.
You simply cannot have a static SQL statement returning a variable number of columns. You need to build such statement each time the number of different districts changes. To do that, you execute first a
SELECT DISTINCT District FROM district_details;
This will give you the list of districts where there are details. You then build a SQL statement iterating over the previous result (pseudocode)
statement = "SELECT name "
For each row returned in d = SELECT DISTINCT District FROM district_details
statement = statement & ", SUM(IF(District=""" & d.District & """,1 ,0)) AS """ & d.District & """"
statement = statement & " FROM district_details GROUP BY name;"
And execute that query. You'll then need have to handle in your code the processing of the variable number of columns
a) "For each " is not supported in MySQL stored procedures.
b) Stored procedures cannot execute prepared statements from concatenated strings using so called dynamic SQL statements, nor can it return results with more than One distinct row.
c) Stored functions cannot execute dynamic SQL at all.
It is a nightmare to keep track of once you got a good idea and everyone seems to debunk it before they think "Why would anyone wanna..."
I hope you find your solution, I am still searching for mine.
The closes I got was
(excuse the pseudo code)
-> to stored procedure, build function that...
1) create temp table
2) load data to temp table from columns using your if statements
3) load the temp table out to INOUT or OUT parameters in a stored procedure as you would a table call... IF you can get it to return more than one row
Also another tip...
Store your districts as a table conventional style, load this and iterate by looping through the districts marked active to dynamically concatenate out a querystring that could be plain text for all the system cares
Then use;
prepare stmName from #yourqyerstring;
execute stmName;
deallocate prepare stmName;
(find much more on the stored procedures part of the mysql forum too)
to run a different set of districts every time, without having to re-design your original proc
Maybe it's easier in numerical form.
I work on plain text content in my tables and have nothing to sum, count or add up
The following assumes you want matches of distinct (name/district) pairs. I.e. Luke/CA and Duke/CA would yield two results:
SELECT name, District, count(District) AS count
FROM district_details
GROUP BY District, name
If this is not the case simply remove name from the GROUP BY clause.
Lastly, notice that I switched sum() for count() as you are trying to count all of the grouped rows rather than getting a summation of values.
Via comment by #cballou above, I was able to perform this sort of function which is not exactly what OP asked for but suited my similar situation, so adding it here to help those who come after.
Normal select statement:
SELECT d.id ID,
q.field field,
q.quota quota
FROM defaults d
JOIN quotas q ON d.id=q.default_id
Vertical results:
ID field quota
1 male 25
1 female 25
2 male 50
Select statement using group_concat:
SELECT d.id ID,
GROUP_CONCAT(q.fields SEPARATOR ",") fields,
GROUP_CONCAT(q.quotas SEPARATOR ",") quotas
FROM defaults d
JOIN quotas q ON d.id=q.default_id
Then I get comma-separated fields of "fields" and "quotas" which I can then easily process programmatically later.
Horizontal results:
ID fields quotas
1 male,female 25,25
2 male 50
Magic!