Why same query results are different on BigQuery editor and sqlalchemy? - google-bigquery

My bigquery query is :
SELECT d.type AS `error_type`, count('d.type') AS `count`
FROM `table_android`, unnest(`table_android`.`exceptions`) AS `d`
WHERE `table_android`.`event_timestamp` BETWEEN '2022-12-15' AND '2022-12-20' GROUP BY `error_type` ORDER BY `count` desc;
This query is working fine in bigquery editor. But same version of query with sqlalchemy I could not get same results.
sqlalchemy query :
sa.select(
sa.literal_column("d.type").label("Error_Type"),
sa.func.count("Error_Type").label("count"),
)
.select_from(sa.func.unnest(table_android.c.exceptions).alias("d"))
.group_by("Error_Type")
.order_by(desc("count"))
.where(table_android.c.event_timestamp.between('2022-12-15', '2022-12-20'))
.cte("main_table")
Correct result :
Wrong result:
I am using python-bigquery-sqlalchemy library. table_android.exceptions column struct is like that :
column types :
And this is render of sqlalchemy query :
SELECT `d`.`type` AS `Error_Type`, count(`d`.`type`) AS `count` FROM `table_android`, unnest(`table_android`.`exceptions`) AS `d` WHERE `table_android`.`event_timestamp` BETWEEN '2022-12-05' AND '2022-12-20' GROUP BY `Error_Type` ORDER BY `count` DESC
I see correct result in bigquery editor. But sqlalchemy is not shows correct result. How should i edit my sqlalchemy query for correct results ?

I don't have bigquery so I can't test this but I think you want column and not literal_column. Also I think you can create an implicit CROSS JOIN by including both the table and the unnested column in the select_from.
# I'm testing with postgresql but it doesn't have easy to make structs
from sqlalchemy.dialects import postgresql
table_android = Table("table_android", metadata,
Column("id", Integer, primary_key=True),
Column("exceptions", postgresql.ARRAY(String)),
Column("event_timestamp", Date))
from sqlalchemy.sql import column, func, select
d = func.unnest(table_android.c.exceptions).alias("d")
# You might be able to do d.column["type"].label("Error_Type")
type_col = column("d.type").label("Error_Type")
count_col = func.count(type_col).label("count")
print (select(
type_col,
count_col,
)
.select_from(table_android, d)
.group_by(type_col)
.order_by(count_col)
.where(table_android.c.event_timestamp.between('2022-12-15', '2022-12-20'))
.cte("main_table"))

Related

How SQL query Case and When works in django ORM?

I have a SQL query and when writing in Django ORM it returns an error. But the SQL query works perfectly on MySQL Command Line Client. Would anyone please explain the error or working of CASE and When in Django ORM?
SQL query:-
SELECT CASE WHEN LENGTH(au.first_name) < 1 THEN au.username ELSE concat(au.first_name,' ',au.last_name)
END AS fullname FROM rewards_usercard ru RIGHT JOIN auth_user au ON ru.user_id = au.id;
Django Models code:-
from django.db.models import Case, When, Q, CharField, Value
from django.db.models.functions import Length, Concat
from django.db.models.lookups import LessThan
queryset = UserCard.objects.annotate(
full_name = Case(
When(condition=Q(
LessThan(Length('user__first_name'),1)
), then='user__username'),
default = Concat('user__first_name', Value(' '), 'user__last_name'),
output_field=CharField()
)
)
Error:-
cannot unpack non-iterable LessThan object
Try this:
from django.db.models.functions import Coalesce, Concat
from django.db.models import (Case, CharField, When, Value)
data = UserCard.objects.all().annotate(fullname=Coalesce(Case(When(user__first_name__in=[None, ""], then="user__username"),default=Concat('user__first_name',Value(" "), 'user__last_name'),output_field=CharField()),"user__username")).values('fullname')

Using WITH in MonetDB

I'm trying to execute the next query in MonetDB using "WITH":
with a as (select data_string from colombia.dim_tempo)
select
t.ano_mes
,f.sg_estado
,f.cod_produto
, sum(f.qtd_vendidas) as qtd_vendidas
, count(*) as fact_count
from colombia.fact_retail_market f, colombia.dim_tempo t
where f.cod_anomes = t.data_string
and t.data_string in (a.data_string)
group by
t.ano_mes
,f.sg_estado
,f.cod_produto ;
But always get this message:
What's wrong with the sentence?
The WHERE clause needs to be:
WHERE f.cod_anomes = t.data_string AND
t.data_string IN (SELECT data_string FROM a)
That is, IN needs to be followed by a subquery against the CTE.

PostgresSQL Cannot order by json_build_object result (got from subquery)

I have a SQL query. And I'd like to order by json field:
SELECT "ReviewPacksModel"."id",
(SELECT json_build_object(
'totalIssues', COUNT(*),
'openIssues', COUNT(*) filter (where "issues".status = 'Open'),
'fixedIssues', COUNT(*) filter (where "issues".status = 'Fixed')
)
FROM "development"."issues" "issues"
JOIN "development"."reviewTasks" as "rt" ON "issues"."reviewTaskId" = "rt".id
WHERE "issues"."isDeleted" = false
AND "rt"."reviewPackId" = "ReviewPacksModel"."id"
) as "issueStatistic"
FROM "development"."reviewPacks" AS "ReviewPacksModel"
WHERE "ReviewPacksModel"."projectId" = '2'
AND "ReviewPacksModel"."mode" IN ('Default', 'Live')
AND "ReviewPacksModel"."status" IN ('Draft', 'Active')
ORDER BY "issueStatistic"->'totalIssues'
LIMIT 50;
And I get an error:
ERROR: column "issueStatistic" does not exist
If I try to order by issueStatistic without ->'totalIssues', I will get another error:
ERROR: could not identify an equality operator for type json
It seems like I cannot extract field from the JSON.
I also tested it with this query:
SELECT "ReviewPacksModel".*,
(SELECT Count(*)
FROM "development"."issues" "issues"
JOIN "development"."reviewTasks" as "rt" ON "issues"."reviewTaskId" = "rt".id
WHERE "issues"."isDeleted" = false
AND "rt"."reviewPackId" = "ReviewPacksModel"."id"
) AS "issueStatistic"
FROM "development"."reviewPacks" AS "ReviewPacksModel"
WHERE "ReviewPacksModel"."projectId" = '2'
AND "ReviewPacksModel"."mode" IN ('Default', 'Live')
AND "ReviewPacksModel"."status" IN ('Draft', 'Active')
ORDER BY "issueStatistic"
LIMIT 50;
And it works without any problems. But I cannot use it cause it's not possible to return multiple columns from a subquery. I also tried to use alternatives like array_agg, json_agg, etc. but it doesn't help.
I know that it's possible to make multiple queries, but they aren't super fast and for me it's better to use json_build_object.
You can use aliases in ORDER BY, but you cannot use expressions involving aliases.
You'll have to use a subquery.
Also, you cannot order on a json. You'll have to convert it to a sortable data type. In the following I assume it is a number; you'll have to adapt the query if my assumption is wrong.
SELECT id, "issueStatistic"
FROM (SELECT "ReviewPacksModel"."id",
(SELECT json_build_object(
'totalIssues', COUNT(*),
'openIssues', COUNT(*) filter (where "issues".status = 'Open'),
'fixedIssues', COUNT(*) filter (where "issues".status = 'Fixed')
)
FROM "development"."issues" "issues"
JOIN "development"."reviewTasks" as "rt" ON "issues"."reviewTaskId" = "rt".id
WHERE "issues"."isDeleted" = false
AND "rt"."reviewPackId" = "ReviewPacksModel"."id"
) as "issueStatistic"
FROM "development"."reviewPacks" AS "ReviewPacksModel"
WHERE "ReviewPacksModel"."projectId" = '2'
AND "ReviewPacksModel"."mode" IN ('Default', 'Live')
AND "ReviewPacksModel"."status" IN ('Draft', 'Active')
) AS subq
ORDER BY CAST ("issueStatistic"->>'totalIssues' AS bigint)
LIMIT 50;
demos:db<>fiddle
You cannot order by type json because, simply spoken, there is no definition on how to handle different types included in the JSON object. But this gives you a type json:
"issueStatistic"->'totalIssues'
However, type jsonb can be ordered. So, instead of creating a type json object, you should use jsonb_build_object() to create a type jsonb object.
Alternatively you could cast your expression into type int (mind the ->> operator instead of your -> which casts the output into type text which can be directly cast into type int):
("issueStatistic"->>'totalIssues')::int
Edit:
As Laurenz mentioned correctly, to use aliases you need a separate subquery:
SELECT
*
FROM (
-- <your query minus ORDER clause>
) s
ORDER BY "issueStatistic"->'totalIssues'

Right way to implement pandas.read_sql with ClickHouse

Trying to implement pandas.read_sql function.
I created a clickhouse table and filled it:
create table regions
(
date DateTime Default now(),
region String
)
engine = MergeTree()
PARTITION BY toYYYYMM(date)
ORDER BY tuple()
SETTINGS index_granularity = 8192;
insert into regions (region) values ('Asia'), ('Europe')
Then python code:
import pandas as pd
from sqlalchemy import create_engine
uri = 'clickhouse://default:#localhost/default'
engine = create_engine(uri)
query = 'select * from regions'
pd.read_sql(query, engine)
As the result I expected to get a dataframe with columns date and region but all I get is empty dataframe:
Empty DataFrame
Columns: [2021-01-08 09:24:33, Asia]
Index: []
UPD. It occured that defining clickhouse+native solves the problem.
Can it be solved without +native?
There is encient issue https://github.com/xzkostyan/clickhouse-sqlalchemy/issues/10. Also there is a hint which assumes to add FORMAT TabSeparatedWithNamesAndTypes at the end of a query. So the init query will be look like this:
select *
from regions
FORMAT TabSeparatedWithNamesAndTypes

multiplying modified colums to form new column

I want a query like:
select tp.package_rate, sum(sell) as sold_quantity , select(tp.package_rate * sold_quantity ) as sell_amount from tbl_ticket_package tp where tp.event_id=1001
here the system firing error while doing the multiplication as
sold_quantity is invalid column
another problem is that in multiplication I want to use package_rate which got by select query from tp.package_rate but it multiplying with all package_rate of the table but I want only specific package_rate which was output of select query
What would you suggest? I want to bind this query in gridview . is there any way to do it using ASP.net gridview?
Your problem is that you are referring to sold_quantity here :
select(tp.package_rate * sold_quantity )
The alias is not recognized at this point.You will have to replace it with sum(sales). You will also have to group by tp.package_rate.
Your query should ideally be like :
select tp.package_rate, sum(sell) as sold_quantity ,
(tp.package_rate * sum(sell) ) as sell_amount from tbl_ticket_package tp
where tp.event_id=1001 group by tp.package_rate;
I am guessing that tp.package_rate is unique for a given event_id, from latter part of your question. If that's not the case, the sql you have written makes no sense.