Issues with imported csv string values DBeaver - sql

I've connected to some downloaded csv's in DBeaver. All of the connected values come in as "Strings". I aliased the names of the csvs first:
SELECT *
FROM us
SELECT *
FROM "us-counties-2023" AS usc
SELECT *
FROM "us-states" AS ust
But then I can't perform any JOINs or figure out how to cast the data. The data looks like this. How can I cast or convert this to useable data that I can perform JOINs and aggregate functions on?
date cases deaths
1/21/2020 1 0
1/22/2020 1 0
1/23/2020 1 0
1/24/2020 2 0
1/25/2020 3 0
1/26/2020 5 0
1/27/2020 5 0
1/28/2020 5 0
1/29/2020 5 0
1/30/2020 6 0
1/31/2020 7 0
2/1/2020 8 0
I attempted to CAST the data, but it keeps throwing errors. I expected to be able to CAST the data to the type I need (Date, Integer, etc...) to then perform some JOINS and other functions.

Related

Generating columns for daily stats in SQL

I have a table that currently looks like this (simplified to illustate my issue):
Thing| Date
1 2022-12-12
2 2022-11-05
3 2022-11-18
4 2022-12-01
1 2022-11-02
2 2022-11-21
5 2022-12-03
5 2022-12-08
2 2022-11-18
1 2022-11-20
I would like to generate the following:
Thing| 2022-11 | 2022-12
1 2 1
2 3 0
3 1 0
4 0 1
5 0 2
I'm new to SQL and can't quite figure this out - would I use some sort of FOR loop equivalent in my SELECT clause? I'm happy to figure out the exact syntax myself, I just need someone to point me in the right direction.
Thank you!
You may use conditional aggregation as the following:
Select Thing,
Count(Case When Date Between '2022-11-01' And '2022-11-30' Then 1 End) As '2022-11',
Count(Case When Date Between '2022-12-01' And '2022-12-31' Then 1 End) As '2022-12'
From table_name
Group By Thing
Order By Thing
See a demo.
The count function counts only the not null values, so for each row not matching the condition inside the count function a null value is returned, hence not counted.

add column with fixed values for each value of another column Redshift

I have following table
]1
want to add date range for each user
How to achieve this:
if this is possible from query in Redshift then that be useful
If not, efficient way to create this in python pandas as data is having 8lk records
Given this dataframe df:
userid username
0 1 a
1 2 b
2 3 c
you can use numpy repeat and tile:
dr = pd.date_range('2020-01-01','2020-01-03')
df = pd.DataFrame(np.repeat(df.to_numpy(), len(dr), 0), columns=df.columns).assign(date=np.tile(dr.to_numpy(), len(df)))
Result:
userid username date
0 1 a 2020-01-01
1 1 a 2020-01-02
2 1 a 2020-01-03
3 2 b 2020-01-01
4 2 b 2020-01-02
5 2 b 2020-01-03
6 3 c 2020-01-01
7 3 c 2020-01-02
8 3 c 2020-01-03
In Sql this is simple too - just cross join with the list of dates you want to add to each row (replicate rows). You can see that in your example that 3 rows and 3 dates results in 9 rows. (untested explanatory code:)
select userid, username, "date" from <table> cross join (select values ('2020-01-01'::date), ('2020-02-01'::date), ('2020-03-01'::date));
Now the problem with simple approach is that if you are dealing with large tables and long lists of dates the multiplication will kill you. 10 billion rows by 5,000 dates is 15 trillion resulting rows - making this will take a long time and storing it will takes lots of disk space. For small tables and short lists of dates this works fine.
If you are in the "big" side of things you will likely need to rethink what you are trying to do. Since you are using Redshift there is a possibility that you may need to do this.

Add multiple future dates at one time into redshift SQL

Now I have a table that with the detail as follows:
Date Campaign Visits Orders Revenue
.... .... .... .... ....
Jun-18 Promotion01 10 1 120
Let's say it called table A
Now because of report purpose, I would like to add in new dates like as follows
Date Campaign Visits Orders Revenue
Jul-18 NULL 0 0 0
Aug-18 NULL 0 0 0
Sep-18 NULL 0 0 0
.... .... .... .... ....
Dec-18 NULL 0 0 0
I would like to use the union to add in only the date data.
I tried the dateadd function in Amazon redshift with the following command
SELECT
to_char(dateadd(month, 18, '01-01-2017'),'yyyy-MM') as plus30,
NULL,
0,
0,
0
It returns the date, however it just return only 1 row i.e
Date Campaign Visits Orders Revenue
Jul-18 NULL 0 0 0
If I want to return multiple row like how it is shown before, except of joining 1 by 1, what should I do then?
Many thanks for your help!
Frankly, the easiest way to do these sorts of tasks is to make an Excel spreadsheet, fill-in all the desired values, save it as CSV and then use COPY to load it into Redshift. This gives you the benefit of a nice interface to create the data without having to play around with SQL.
The reason you are only getting one line is that SELECT normally works off a table of data, returning one result per row of input data. You have not specified a table, so it is returning only one row.
Fortunately, you can use generate_series() in some situation to simulate data:
SELECT
to_char(date '01-01-2017' + counter * interval '1 month','yyyy-MM') as plus30,
NULL,
0,
0,
0
FROM generate_series(1, 12) as counter
This generates:
2017-02 (null) 0 0 0
2017-03 (null) 0 0 0
2017-04 (null) 0 0 0
2017-05 (null) 0 0 0
2017-06 (null) 0 0 0
2017-07 (null) 0 0 0
2017-08 (null) 0 0 0
2017-09 (null) 0 0 0
2017-10 (null) 0 0 0
2017-11 (null) 0 0 0
2017-12 (null) 0 0 0
2018-01 (null) 0 0 0
(Your question showed yyyy-MM in the SQL, but Mon-YY in the output, so please adjust accordingly. It is normally better to use yyyy-MM because it is easily sortable as both a number and a string.)

Collating data in SQL Server

I have the following data in SQL Server
St 1 2 3 4 5 6 7 8
===========================================
603 2 5 1.5 3 0 0 0 0
603 0 0 0 0 2 1 3 5
As I insert the data by batches, each batch only has 4 columns each and I want to collate the data to the following
St 1 2 3 4 5 6 7 8
===========================================
603 2 5 1.5 3 2 1 3 5
but most of the threads I see here are about concatenating strings of a single column.
Anyone has any idea on how to collate or even merge different rows into a single row.
You can use the group by and Sum key word of the t-SQL
SELECT SUM(COL1) , SUM(COL2)..... FROM tbl GROUP BY ST
You can use the GROUP BY clause and aggregate with SUM fields 1-8 :
SELECT St, SUM(1), SUM(2),.. FROM tbl GROUP BY St

Sql Server Row Concatenation

I have a table (table variable in-fact) that holds several thousand (50k approx) rows of the form:
group (int) isok (bit) x y
20 0 1 1
20 1 2 1
20 1 3 1
20 0 1 2
20 0 2 1
21 1 1 1
21 0 2 1
21 1 3 1
21 0 1 2
21 1 2 2
And to pull this back to the client is a fairly hefty task (especially since isok is a bit). What I would like to do is transform this into the form:
group mask
20 01100
21 10101
And maybe go even a step further by encoding this into a long etc.
NOTE: The way in which the data is stored currently cannot be changed.
Is something like this possible in SQL Server 2005, and if possible even 2000 (quite important)?
EDIT: I forgot to make it clear that the original table is already in an implicit ordering that needs to be maintained, there isnt one column that acts as a linear sequence, but rather the ordering is based on two other columns (integers) as above (x & y)
You can treat the bit as a string ('0', '1') and deploy one of the many string aggregate concatenation methods described here: http://www.simple-talk.com/sql/t-sql-programming/concatenating-row-values-in-transact-sql/