tablesaw library: join tables with different column type - tablesaw

I need to use dataframe library to join the results of two different queries, or query result with the JSON result. I did some research and found the tablesaw library. During the implementation, I faced some problems.
Does anyone have experience with this library?
How can I join tables with different columns type?
For example, the first table column is an integer type and the second table column is a long type, and I want to join tables with those columns. Now I am getting the following error:
Exception in thread "main" java.lang.ClassCastException: tech.tablesaw.index.FloatIndex cannot be cast to tech.tablesaw.index.IntIndex

Related

How to flatten tables correcty in Big Query?

I have the following tables:
In table 2 (yellow looking fields), the first field is part of the following:
name1 RECORD NULLABLE
name1. name2 RECORD REPEATED
name1.name2. date_inserted TIMESTAMP NULLABLE
As you can see the last (sub-row?) of the row 25 is greyed because it is part of the repeated record name1.name2
I am trying to join table 2, with table 1(orange looking fields) on another field. I have 0 experience with records or repeated records but using FLATTEN() I managed to join them.
The problem is, I noticed that some dates from the 2nd after the join return NULL although there aren't any NULLS before it. So since I can't figure out what the greyed cells are I guess I am doing something wrong.
All this sums up to: How can I totally flatten all tables that I want to use so that there won't be any records at all and so I can go through the data with simple SQL statements? Please provide an example as well. Looking for something generic.
How can I totally flatten all tables that I want to use so that there won't be any records at all and so I can go through the data with simple SQL statements?
It really depends on the schemas you are working with. You can preprocess them, flatten the arrays and rename the structs fields, then use that as your base table to work with simple SQL statements
For your scenario, you can start by flattening the table 2, name2 column like this
SELECT
name2.date_inserted -- Add additional fields you want on the result
FROM table2, table2.name1.name2
You can do CROSS JOIN and LEFT JOIN to further adjust your results.
Please provide an example as well. Looking for something generic.
I'm not sure about a generic approach, since each schema would probably have distinct requirements. The key concept is to know how to flatten arrays and how to query struct with arrays and arrays of structs
You can find plenty examples in that documentation

Postgres copy to select * with a type cast

I have a group of two SQL tables in postgres. A staging table and the main table. Among the variety of reasons for the staging table, the data i am uploading has irregular and different formats for all of the date columns. During the upload process these values go into staging table as varchars to be manipulated into usable formats.
In the main table the column type for the date fields is of type 'date' in the staging table they are of type varchar.
The question is, does postgres support a copy expression similar to
insert into production_t select *,textdate::date from staging_t
I need to change the format of a single field during the copy process. I know i can individually type out all of the column names during the insert and typecast the date columns there, but this table has over 200 columns and is one of 10 tables with similar issues. I want to accomplish this insert+typecast in one line that i can apply to all tables rather than having to type 2000+ lines of sql queries.
You have to write every column of such a query, there is no shorthand.
May I say that a design with 200 columns is questionable.

(SQL) How do I differentiate 2 columns from different tables with the same name when selecting?

I'm using an Oracle 12c SQL server. The goal is to create a view containing each company and the drugs it produces.
How can I differentiate two columns with the exact same name but located in different tables using SELECT?
All relevant code below, including results with error.
I understand why I might be getting a duplicate name error as they both have the same header "name", but thought I handled it by identifying the table beforehand (i.e. pc.name and dg.name). Help!
SQL Tables Being Joined:
SQL Column Naming Error:
You have ambiguous column names in output from your view:
pc.name, dg.name
Adding alias for columns should solve this:
pc.name as pc_name, dg.name as dg_name

SQL join fails on temp tables

I'm having trouble joining two temp tables where all columns are varchar(MAX).
I'm trying to join on columns that both contains value 'SV-001', when I change value to 'SV-0' there are no problems, but when I add 1 more '0' it fails?
Values on both tables are collected from different standard tables and I have tested the results before I join them - even excel can compare values, so I'm sure values are identical.
I'm joining them like this:
SELECT *
FROM #Speedwell_setup
JOIN #Speedwell_data ON #Speedwell_setup.Productcode = #Speedwell_data.Product1
All I get is a empty result, no errors or anything, I hope that you can help me out here.
Thanks in advance

T-SQL - Using Column Name in where condition without any references when having multiple tables that are engaged using multiple joins

I am a newbie for T-Sql, I came across a SP where multiple tables are engaged using multiple joins but the where clause contain a column field without any table reference and assigned for an incoming variable,like
where 'UserId = #UserId'
instead - no table reference like
'a.UserId = #Userid'`
Can any please do refer to me any material that clears my mind regarding such issue.
If the query works it means that there is only one Column with the name UserId, if there are multiple columns with the same name you have to reference the table too.
If you don't specify the table reference you will get
Ambiguous column name 'UserId'. error
Which means there are more then 2 tables with a column name UserId.
Anyway, always try and use the reference table.