Producing timestamp in correct format using Pentaho DI - pentaho

I am using Data Integration to get data from our online API. Apart of the data is a timestamp and this is printed like so on the website 1389227435641 but when it is printed up on a table it is printed like so 1.389227435641E12
How do I get it to print like it is from the website and not like the way it is now?

Is the API returning the number in scientific notation?
If you want to convert 1.389227435641E12 to another format, import it as a BigNumber then use Select Values and change the Format.

Related

Converting a STRING to DATE in Big Query [duplicate]

Been struggling with some datasets I want to use which have a problem with the date format.
Bigquery could not load the files and returned the following error:
Could not parse '4/12/2016 2:47:30 AM' as TIMESTAMP for field date (position 1) starting at location 21 with message 'Invalid time zone:
AM'
I have been able to upload the file manually but as strings, and now would like to set the fields back to the proper format, However, I just could not find a way to change the format of the date column from string to proper DateTime format.
Would love to know if this is possible as the file is just too long to be formatted in excel or sheets (as I have done with the smaller files from this dataset).
now would like to set the fields back to the proper format ... from string to proper DateTime format
Use parse_datetime('%m/%d/%Y %r', string_col) to parse datetime out of string
If applied to sample string in your question - you got
As #Mikhail Berlyant rightly said, using the parse_datetime('%m/%d/%Y %r', string_col)
function will convert your badly formatted dates to a standard format as per ISO 8601 accepted by Google Bigquery . the best option will then be to save these query results to a new table on the database in your Bigquery Project.
I had a similar issue.
Below is an image of my table which i uploaded with all columns in String format .
Next up was that i applied the following settings to the query below
The Settings below stored the query output to a new table called heartrateSeconds_clean on the same dataset
The Write if empty option is a good option to avoid overwriting the existing raw data or just arbitrarily writing output to a temporary table, except if you are sure you want to do so. Save the settings and proceed to Run your Query.
As seen below, the output schema of the new table is automatically updated
Below is the new preview of the resulting table
NB: I did not apply an ORDER BY clause to the Results hence the data is not ordered by any specific column in both versions of the same table.
This dataset has over 2M rows.

Formatting a string to time on BigQuery?

I've got a huge (1.5GB) CSV file, with dates in it in the format 2014-12-25. I have managed to upload it to BigQuery with the format string for this column. I'm wondering if I can transform this in situ to a datetime format, without having to download the data, parse it and send it back?
I have used the BigQuery GUI (newbie) but am happy to use the CLI if this will make it easier.
You can use some of Date and time functions to "transform" string represented date to datetime
For example
SELECT '2014-12-25', TIMESTAMP('2014-12-25')
Added:
If you feel that you really need to have your data with date in timestamp format vs string and you have this data (string) already in BigQuery - you can do just similar to below query with writing to new table.
SELECT
TIMESTAMP(date_string) as date_timestamp,
< list all the rest of the fields >
FROM original_table

Custom date format for loading data into BigQuery, using bq?

I'm uploading a CSV file to Google BigQuery using bq load on the command line. It's working great, but I've got a question about converting timestamps on the fly.
In my source data, my timestamps are formatted as YYYYMM, e.g. 201303 meaning March 2013.
However, Google BigQuery's timestamp fields are documented as only supporting Unix timestamps and YYYY-MM-DD HH:MM:SS format strings. So unsurprisingly, when I load the data, these fields don't convert to the correct date.
Is there any way I can convey to BigQuery that these are YYYYMM strings?
If not I can convert them before loading, but I have about 1TB of source data, so I'm keen to avoid that if possible :)
Another alternative is to load this field as STRING, and convert it to TIMESTAMP inside BigQuery itself, copying the data into another table (and deleting the original one afterwards), and doing the following transformation:
SELECT TIMESTAMP(your_ts_str + "01") AS ts
An alternative to Mosha's answer can be achieved by:
SELECT DATE(CONCAT(your_ts_str, "01")) as ts

Bigquery SUM(Float_Values) returns multiple decimal places and Scientific Notation

I am trying to calculate Total Sales at a store. I have product Price in a column called UNIT_PRICE. All the prices have 2 decimal places example: 34.54 or 19.99 etc and they are imported as type:float in the schema. (UNIT_PRICE:float)
When I perform the select Query: "SELECT CompanyName, SUM(Unit_Price) as sumValue" etc I get the following returned in the column, but only "sometimes".
2.697829165015719E7
It should be something like: 26978291.65
As I am piping this out into spreadsheets and then charting it I need it to be in the type float or at least represent a normal price format.
I have tried the following but still having issues:
Source: Tried converting original data type to BigDecimal with only 2 decimal points in the source data and then exporting to the csv for import into bigquery but same result.
Bigquery: Tried converting to a string first and then to a float and then SUM but same result. "SELECT CompanyName, SUM(Float(String(Unit_Price))) as sumValue"
Any ideas on how to deal with this?
Thanks
BigQuery uses default formatting for floating point numbers, which means that depending on the size of the number, may use scientific notation. (See the %g format specifier here)
We tried switching this, but it turns out, it is hard to get a format that makes everyone happy. %f formatting always produces decimal format, but also pads decimals to a 6 digit precision, and drops decimals beyond a certain precision.
I've filed a bug to allow an arbitrary format string conversion function in BigQuery. It would let you run SELECT FORMAT_STRING("%08d", SUM(Unit_Price)) FROM ... in order to be able to control the exact format of the output.
Do you see this in the BQ browser tool or only on your spreadsheet?
BQ float is of size of 8 bytes, so it can hold numbers >9,000,000,000,000...
I find it that sometimes when Excel opens a flat file (csv) it converts it to the format you mentioned. To verify this is the case, try to open your csv with notepad (or other flat file editor), before you try with excel.
If this is indeed the issue, you can configure the excel connector to treat this field as string instead of number. other option would be to convert it to string and concat "" to the number. this way the spreadsheet will automatically treat it as string. afterwards you can convert it back to number in the spreadsheet.
Thanks

Format a money field in SQL without converting to varchar?

I need to be able to display a money field as $XX,XXX.XX, but without converting to varchar using total_eval = '$' + CONVERT(varchar(19),total_eval.opvValueMoney,1)
My project uses sorting of the information after I pull this to sort the column and it doesn't sort correctly when the column is a varchar.
Is there anyway to do this?
This is part of an ASP.NET system, but I have no access or control over after the information is returned.
Can you not format this when the data is being printed on the screen? Then the number remains a number and you can format is as you please at the presentation level.
For example, using PHP you could do something like this:
echo money_format('$%i', 3.4); // echos '$3.40'
// ^ here is your number, no formatting from the db!
This example was found in an answer to this question.