Redshift query data format - sql

one of the column has data in format as below:
column_name_a:
abcd/date=2018-01-01/part-0001-asdfasdfasdf
abcd/date=2018-01-01/part-0002-asdfasdfasdf
abcd/date=2018-01-02/part-0001-asdfasdfasdf
abcd/date=2018-01-02/part-0002-asdfasdfasdf
abcd/date=2018-01-03/part-0001-asdfasdfasdf
abcd/date=2018-01-03/part-0002-asdfasdfasdf
abcd/date=2018-01-03/part-0003-asdfasdfasdf
abcd/date=2018-01-03/part-0004-asdfasdfasdf
.....
Now I need to get file count either by day or by part number.
How do I write my query?

Adding to Nate's answer, you can use split_part multiple times to get what you require:
To get date:
select split_part(split_part('abcd/date=2018-01-01/part-0001-asdfasdfasdf','/',2),'=',2)
To get part number:
select split_part(split_part('abcd/date=2018-01-01/part-0001-asdfasdfasdf','/',3),'-',2)

Use split_part. This will still have 'date=' in the string..
date = split_part(column_name_a,'/',2)
part_number = split_part(column_name_a,'/',3)
details are here...
https://docs.aws.amazon.com/redshift/latest/dg/SPLIT_PART.html

Related

SQL group by middle part of string

I have string column that looks usually approximately like this:
https://mapy.cz/zakladni?x=16.3360208&y=49.6718038&z=8&source=firm&id=13123554
https://mapy.cz/turisticka?x=15.9380354&y=50.1990211&z=11&source=base&id=2197
https://mapy.cz/turisticka?x=12.8611357&y=49.8051338&z=16&source=base&id=1703157
I would like to group data by source which is part of the string - four letters behind "source=" (in the case above: firm) and then simply count them. Is there a way to achieve this directly in SQL code? I am using hadoop.
Data is a set of strings that look like above. My expected result is summary table with two columns: 1) Each type of the source (there is about 20 possible and their length is different so I cannot use sipmle substring). Ideally I am looking for solution that says: For the grouping use four letters that come after "source=" 2) Count of their occurences in all the strings.
There is just one source type in each string.
You can use regexp_extract():
select substr(regexp_extract(url, 'source[^&]+'), 8)
You can use charindex in MSSQL to get position of string and extract record
;with cte as (
SELECT SUBSTRING('https://mapy.cz/zakladni?x=16.3360208&y=49.6718038&z=8&source=firm&id=13123554',
charindex('&source=','https://mapy.cz/zakladni?x=16.3360208&y=49.6718038&z=8&source=firm&id=13123554')
+8,4) AS ExtractString )
select ExtractString,count(ExtractString) as count from cte group by ExtractString;
There is equivalent function LOCATE in hiveql for charindex.

regex trim the part of the string sql

My data lives in Big Query. There is one column that needs REGEX extraction. The example of the string is below:
?src=abb_fh_uit*_source=h&_medium=cpm&my_campaign=abb_hc_hr
src=abb_fh_uit*_source=h&_medium=cpm&my_campaign=goal_healthcare
?src=abb_fh_uit*_source=h&_medium=cpm&my_campaign=goal_hr
?src=abb_fh_uit*_source=h&_medium=cpm&my_campaign=abb_hr_healthcare
My desired output is this:
my_campaign=goal
my_campaign=goal
Basically I need to trim everything but my_campaign=goal
The code I wrote is in SQL, below:
LOWER(REGEXP_EXTRACT(my_column,r'my_campaign=([^&])')) AS my_campaign
it returns everything with my_campaign my_campaign=abb_hc_hr, my_campaign=goal_healthcare etc. How should I change the existing code to just grab my_campaign=goal?
Thank you.
Below is for BigQuery Standard SQL
You should use below
SELECT
LOWER(REGEXP_EXTRACT(my_column,r'(my_campaign=[^&]*)&?')) AS my_campaign
FROM your_table
WHERE LOWER(my_column) LIKE '%my_campaign=goal_%'
if applied to sample data from your question - output is
Row my_campaign
1 my_campaign=goal_healthcare
2 my_campaign=goal_hr

Getting an error when using CONCAT in BigQuery

I'm trying to run a query where I combine two columns and separate them with an x in between.
I'm also trying to get some other columns from the same table. However, I get the following error.
Error: No matching signature for function CONCAT for argument types: FLOAT64, FLOAT64. Supported signatures: CONCAT(STRING, [STRING, ...]); CONCAT(BYTES, [BYTES, ...]).
Here is my code:
SELECT
CONCAT(right,'x',left),
position,
numbercreated,
Madefrom
FROM
table
WHERE
Date = "2018-10-07%"
I have tried also putting a cast before but that did not work.
SELECT Concast(cast(right,'x',left)), position,...
SELECT Concast(cast(right,'x',left)as STRING), position,...
Why am I getting this error?
Are there any fixes?
Thanks for the help.
You need to cast each value before the concat():
SELECT CONCAT(CAST(right as string), 'x', CAST(left as string)),
position, numbercreated, Madefrom
FROM table
WHERE Date = '2018-10-07%';
If you want a particular format, then use the FORMAT() function.
I also doubt that your WHERE will match anything. If Date is a string, then you probably want LIKE:
WHERE Date LIKE '2018-10-07%';
More likely, you should use the DATE function or direct comparison:
WHERE DATE(Date) = '2018-10-07'
or:
WHERE Date >= '2018-10-07' AND
Date < '2018-10-08'
Another option to fix your issue with CONCAT is to use FROMAT function as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1.01 AS `right`, 2.0 AS `left`
)
SELECT FORMAT('%g%s%g', t.right, 'x', t.left)
FROM `project.dataset.table` t
result will be
Row f0_
1 1.01x2
Note: in above specific example - you could use even simpler statement
FORMAT('%gx%g', t.right, t.left)
You can see more for supporting formats
Few recommendations - try not to use keywords as a column names/aliases. If for some reason you do use - wrap such with backtick or prefix it with table name/alias
Yet another comment - looks like you switched your values positions - your right one is on left side and left one is on right - might be exactly what you need but wanted to mention
Try like below by using safe_cast:
SELECT
CONCAT(SAFE_CAST( right as string ),'x',SAFE_CAST(left as string)),
position,
numbercreated,
Madefrom
FROM
table
WHERE
Date = '2018-10-07'

Update Query to get rid of "AM" & "PM" in string

I wrote a basic update query:
Update WA SET WA.Time_Updated = Replace(Time_Updated, 'PM', ' ');
to which I don't get any real error message other than
Microsoft can't update 251 records etc due to type conversion error
There are 5000 records in there. I have the date column as Date/Time and all my other columns (non-dates) as Short Text. The query just does not update anything in the table and keeps it previously was. Any ideas?
Just convert your text times to Date values:
Select *, TimeValue([Time_Updated]) As TimeUpdated From WA
Then, when you display TimeUpdate, format the value as you like.
Can deal with the imported structure.
Consider:
Hour("12:03:00 PM") + Minute("12:03:00 PM")/60 + Second("12:03:00 PM")/3600
This calculates to 12.05
So don't change the raw data, calculate in query. Just use your field name in place of the static value in the expression.

Microsoft Access SQL Date Comparison

I am using Access 2007.
I need to return rows with a date/time field falling within a date range to be specified in query parameters.
The following doesn't error out, but doesn't appear to work.
SELECT FIELDS FROM FOO
WHERE (FOO.CREATED_DTG BETWEEN [START_DTG] And [END_DTG]);
Likewise this doesn't work for me
SELECT FIELDS FROM FOO
WHERE (FOO.CREATED_DTG >= [START_DTG] And FOO.CREATED_DTG < [END_DTG]);
How can I get this to work?
Update: Using CDate doesn't seem to make a difference.
Is BLAH the name of a field or a table? As you SELECT BLAH I imagine it names a field, but then BLAH.CREATED_DTG makes no sense -- do you mean FOO.CREATED_DTG perchance?
Does your dates start and end with a #?
also you have <= and >= ... you probably only want = on one of these operators.
Are you sure the CREATED_DTG field is Date format?
Have you tried
WHERE (FOO.CREATED_DTG BETWEEN #01/01/1971# And #07/07/2009#);
(or whatever is appropriate in the way of dates -- the point is, not a parameter query)
Are [START____DTG] and [END____DTG] fields in the table FOO, or are they parameters? If they are parameters, then you need to declare their type in order to get validation of the input values. If so, you should add this before the first line of your SELECT statement:
PARAMETERS [START_DTG] DateTime, [END_DTG] DateTime;