I have a two tables in a database in AWS Athena that I want to join.
I want to join them by several columns, one of them being date.
However in one data set the date string is encoded for single value months is encoded as
"08/31/2018"
While the other would have it encoded as
"8/31/2018"
Is there a way to make them the same format?
I am unsure if it is easier to add the extra 0 to strings which have lack the extra 0 or to concatenate strings which have the extra 0.
Based on what I have researched I think I will have to use the CASE and CONCAT functions.
Both of the tables were loaded into the database from a CSV file, and the variables are in the string format.
I have tried changing the values manually in the CSV file, tried running an R script on one of the tables to format the date in the same way, and have also tried re-loading the tables into the database as the same date format.
However no matter what I do whenever it is loaded into the database, even when they have the same date type, it always loads them with different formats.
One with the the extra 0 and the other without it.
The last avenue I haven't tried is through a SQL query.
However I am not well versed in Athena and am having a hard time formatting this query.
I know this is rather vague, so please ask me for more information if you need.
If someone could help me start this query I would be grateful.
Thank you for the help.
Here is the query for changing dates in Athena.
date_parse(table.date_variable,'%m/%d/%Y')
Though Athena tables are immutable once created.
You can convert the value to date using date_parse(). So, this should work:
date_parse(t1.datecol, '%m/%d/%Y') = str_to_date(t2.datecol, '%m/%d/%Y')
Having said that, you should fix the data model. Store dates as dates not as strings! Then you can use an equality join and that is just better all around.
Related
I want to start by saying my SQL knowledge is limited (the sololearn SQL basics course is it), and I have fallen into a position where I am regularly asked to pull data from the SQL database for our ERP software. I have been pretty successful so far, but my current problem is stumping me.
I need to filter my results by having the date match from 2 separate tables.
My issue is that one of the tables outputs DATETIME with full time data. e.g. "2022-08-18 11:13:09.000"
While the other table zeros the time data. e.g. "2022-08-18 00:00:00.000"
Is there a way I can on the fly convert these to just a DATE e.g. "2022-08-18" so I can set them equal and get the results I need?
A simple CAST statement should work if I understand correctly.
CAST( dateToConvert AS DATE)
I am trying to insert data from a staging table into the master table. The table has nearly 300 columns, and is a mix of data-typed Varchars, Integers, Decimals, Dates, etc.
Snowflake gives the unhelpful error message of "Numeric value '' is not recognized"
I have gone through and cut out various parts of the query to try and isolate where it is coming from. After several hours and cutting every column, it is still happening.
Does anyone know of a Snowflake diagnostic query (like Redshift has) which can tell me a specific column where the issue is occurring?
Unfortunately not at the point you're at. If you went back to the COPY INTO that loaded the data, you'd be able to use VALIDATE() function to get better information to the record and byte-offset level.
I would query your staging table for just the numeric fields and look for blanks, or you can wrap all of your fields destined for numeric fields with try_to_number() functions. A bit tedious, but might not be too bad if you don't have a lot of numbers.
https://docs.snowflake.com/en/sql-reference/functions/try_to_decimal.html
As a note, when you stage, you should try and use the NULL_IF options to get rid of bad characters and/or try to load them into stage using the actual datatypes in your stage table, so you can leverage the VALIDATE() function to make sure the data types are correct before loading into Snowflake.
Query your staging using try_to_number() and/or try_to_decimal() for number and decimal fields of the table and the use the minus to get the difference
Select $1,$2,...$300 from #stage
minus
Select $1,try_to_number($2)...$300 from#stage
If any number field has a string that cannot be converted then it will be null and then minus should return those rows which have a problem..Once you get the rows then try to analyze the columns in the result set for errors.
I have a variable "month" in SQL DB with 2 different formats: yyyy-mm and yyyy-m (e.g., 2015-11, 2016-1). When I try to sort the 'month" (ascending or descending), it is not showing properly due to this format difference. How to change the yyyy-m format to yyyy-mm?
If you can be certain that it's never going to be anything other than those exact formats you can insert a zero at a known point in the string -
SELECT MonthField,
CASE WHEN LEN(MonthField)=6 THEN
LEFT(MonthField,5)+'0' + RIGHT(MonthField,1)
ELSE MonthField
END AS MonthField_Cleaned
FROM SourceTable
But if at all possible you'd be better off cleaning whatever's creating the data, and ideally storing the data as actual dates rather than strings representing the months to ensure they can't get mixed up and you can use the DBMS' date manipulation functions.
I've found a few questions on this but none seem to fit my problem case quite right.
Overview: Data is in Oracle 10g database, requirement including using MS Access as a front end.
Problem: The tables include date fields which are incompatible with MS Access. I NEED to run queries based on date and time in MS Access
Details:
I'm not allowed to redesign the tables
Decided to create new tables on the server and run inserts from the old tables to the new
Probably sounds weird but given the constraints I'm allowed to do what I want if I duplicate the data
With the new tables I want to take the date/time/timezone field from the old and insert it into a new table with the date/time but strip the timezone, put it in a field by itself
The big requirement is to have the data be usable. If I do a TO_CHAR it becomes a string and I can't setup queries based on date and time with that as it's a static text field at that point.
Any help is appreciated! Thanks !!!
If possible the best way I have found to deal with these issues is to link to the table via a view. You can then present the data how you wish under the hood without having to alter the table structures.
I found an answer for this. Looking here:
Oracle Date Functions
They give some samples wrapping a to_char with to_date. I formatted this in a way to convert it to text stripping the time zone then wrapped it with the to_date to convert it back to a date and time field that's compatible with MS Access. Here's the code:
SELECT TO_DATE(TO_CHAR(table.date, 'DD-MON-YYYY HH24:MI:SS'),'DD-MON-YYYY HH24:MI:SS')
FROM table.date;
I'd like to know what my best option would be to import data from an excel file on a weekly or monthly basis. At first, I thought I would use SSIS, but after much struggle with seemingly simple tasks, I'm starting to rethink my plan. Would it be better/easier to just write the SQL by hand or use the services of an SSIS package? The basic process will be as follows:
A separate process will download an .xls file to a local fileshare.
The xls file will have a filename like: 'myfilename MON YY'.
I will need to read the month and year from the the filename, reformat it to a sql date and then query a DimDate table to find the corresponding date key.
For each row (after the first 2 header rows), insert the data with the date key, unless the row is a total row, then ignore.
Here are some of the issues I've been encountering with SSIS:
I can parse the date string from a flat file datasource, but can't seem to do it with an excel data source. Also, once parsed, i cannot seem to convert the string to a date in order to perform the lookup for the date key. For example, I want to do something like this:
select DateKey from DimDate
where ActualDate = convert(datetime, '01-' + 'JAN-10', 120)
but i don't think it is possible to use the 'convert' or 'datetime' keywords in an expression builder. I have been also unable to find where I can edit the SQL to ignore the first 2 rows of data.
I'm very skeptical of using SSIS because it seems like a Kludgy way of doing something that can probably be accomplished more efficiently writing the SQL yourself, but I may be forced to use SSIS. Thoughts?
SSIS is definitely the direction to go.
To hit on your problems: (DT_DBTIMESTAMP) is the conversion you want. The syntax is a bit different. For instance to convert your example date I would use:
(DT_DBTIMESTAMP)"01/01/2010"
If you use that expression in a derived column to replace your string date (or create a new column), you could then do a lookup against datetime columns in a DB.
If you need to exclude the first two rows, you will either need to write an SQL statement to query the file (as opposed to an excel file reader source) or use a conditional split to throw them away based on any condition that can be repeated with every import.
Flat files easier to work with, and do allow you to throw away x number of initial rows.