SQLite - Complex Query - sql

This is what I want to get.
Art|CANTIDAD1|CANTIDAD2|CANTIDAD1CARGA1 |CANTIDAD2CARGA1 |CANTIDAD1CARGA2 | CANTIDAD2CARGA2
----------------------------------------------------------------------------------------------
001| 7 | 0 | 4 | 0 | 3 | 0
002| 0 | 2 | 0 | 1 | 0 | 1
003| 2 | 0 | 2 | 0 | 0 | 0
004| 3 | 0 | 1 | 0 | 2 | 0
005| 2 | 0 | 0 | 0 | 2 | 0
006| 0 | 1 | 0 | 0 | 0 | 1
I get CANTIDAD1 and CANTIDAD2 doing this query. It is the result of the sum of the amounts corresponding to the "where"
SELECT
SUM(D.NCANTIDAD1) AS NTOTCANTIDAD1,
SUM(D.NCANTIDAD2) AS NTOTCANTIDAD2
FROM
CABPEDIDOS C,
DETPEDIDOS D,
ARTICULOS A
WHERE
C.DFECHAALBARAN IS NULL
AND C.CSERIE = D.CSERIE
AND C.NPEDIDO = D.NPEDIDO
AND D.NFABRICANTE = A.NFABRICANTE
AND D.CARTICULO = A.CARTICULO
GROUP BY
D.NFABRICANTE, D.CARTICULO, A.CNOMBRE
CANTIDAD1CARGA1, CANTIDAD2CARGA1 are quantities that are in the database (d.cantidad1, d,cantidad2 are the real names, I have to sum all of them to get CANTIDAD1 and CANTIDAD2), but I need to get the quantities corresponding to the respective C.CARGA:
(CANTIDAD1 = CANTIDAD1CARGA1 + CANTIDAD1CARGA2)
How can I get these values?
** C.NCARGA can have more than one value, I need to get all CANTIDAD1CARGA'x' and CANTIDAD2CARGA'x'
I don't care if I have to use two querys,
- one for CANTIDAD1 and CANTIDAD2
- other for CANTIDAD1CARGA1, CANTIDAD2CARGA1, CANTIDAD1CARGA2... etc

I have a feeling I'm not really understanding the question, but it seems like you just need:
SELECT CANTIDAD1CARGA1 + CANTIDAD1CARGA2 AS CANTIDAD1,
CANTIDAD2CARGA1 + CANTIDAD2CARGA2 AS CANTIDAD2,
CANTIDAD1CARGA1, CANTIDAD2CARGA1, CANTIDAD1CARGA2, CANTIDAD2CARGA2
FROM ...

Related

How do you control float formatting when using DataFrame.to_markdown in pandas?

I'm trying to use DataFrame.to_markdown with a dataframe that contains float values that I'd like to have rounded off. Without to_markdown() I can just set pd.options.display.float_format and everything works fine, but to_markdown doesn't seem to be respecting that option.
Repro:
import pandas as pd
df = pd.DataFrame([[1, 2, 3], [42.42, 99.11234123412341234, -23]])
pd.options.display.float_format = '{:,.0f}'.format
print(df)
print()
print(df.to_markdown())
outputs:
0 1 2
0 1 2 3
1 42 99 -23
| | 0 | 1 | 2 |
|---:|------:|--------:|----:|
| 0 | 1 | 2 | 3 |
| 1 | 42.42 | 99.1123 | -23 |
(compare the 42.42 and 99.1123 in the to_markdown table to the 42 and 99 in the plain old df)
Is this a bug or am I missing something about how to use to_markdown?
It looks like pandas uses tabulate for this formatting. If it's installed, you can use something like:
df.to_markdown(floatfmt=".0f")
output:
| | 0 | 1 | 2 |
|---:|----:|----:|----:|
| 0 | 1 | 2 | 3 |
| 1 | 42 | 99 | -23 |

Iterate through pandas data frame and replace some strings with numbers

I have a dataframe sample_df that looks like:
bar foo
0 rejected unidentified
1 clear caution
2 caution NaN
Note this is just a random made up df, there are lot of other columns lets say with different data types than just text. bar and foo might also have lots of empty cells/values which are NaNs.
The actual df looks like this, the above is just a sample btw:
| | Unnamed: 0 | user_id | result | face_comparison_result | created_at | facial_image_integrity_result | visual_authenticity_result | properties | attempt_id |
|-----:|-------------:|:---------------------------------|:---------|:-------------------------|:--------------------|:--------------------------------|:-----------------------------|:----------------|:---------------------------------|
| 0 | 58 | ecee468d4a124a8eafeec61271cd0da1 | clear | clear | 2017-06-20 17:50:43 | clear | clear | {} | 9e4277fc1ddf4a059da3dd2db35f6c76 |
| 1 | 76 | 1895d2b1782740bb8503b9bf3edf1ead | clear | clear | 2017-06-20 13:28:00 | clear | clear | {} | ab259d3cb33b4711b0a5174e4de1d72c |
| 2 | 217 | e71b27ea145249878b10f5b3f1fb4317 | clear | clear | 2017-06-18 21:18:31 | clear | clear | {} | 2b7f1c6f3fc5416286d9f1c97b15e8f9 |
| 3 | 221 | f512dc74bd1b4c109d9bd2981518a9f8 | clear | clear | 2017-06-18 22:17:29 | clear | clear | {} | ab5989375b514968b2ff2b21095ed1ef |
| 4 | 251 | 0685c7945d1349b7a954e1a0869bae4b | clear | clear | 2017-06-18 19:54:21 | caution | clear | {} | dd1b0b2dbe234f4cb747cc054de2fdd3 |
| 5 | 253 | 1a1a994f540147ab913fcd61b7a859d9 | clear | clear | 2017-06-18 20:05:05 | clear | clear | {} | 1475037353a848318a32324539a6947e |
| 6 | 334 | 26e89e4a60f1451285e70ca8dc5bc90e | clear | clear | 2017-06-17 20:21:54 | suspected | clear | {} | 244fa3e7cfdb48afb44844f064134fec |
| 7 | 340 | 41afdea02a9c42098a15d94a05e8452b | NaN | clear | 2017-06-17 20:42:53 | clear | clear | {} | b066a4043122437bafae3ddcf6c2ab07 |
| 8 | 424 | 6cf6eb05a3cc4aabb69c19956a055eb9 | rejected | NaN | 2017-06-16 20:00:26 |
I want to replace any strings I find with numbers, per the below mapping.
def no_strings(df):
columns=list(df)
for column in columns:
df[column] = df[column].map(result_map)
#We will need a mapping of strings to numbers to be able to analyse later.
result_map = {'unidentified':0,"clear": 1, 'suspected': 2,"caution" : 3, 'rejected':4}
So the output might look like:
bar foo
0 4 0
1 1 3
2 3 NaN
For some reason, when I run no_strings(sample_df) I get errors.
What am I doing wrong?
df['bar'] = df['bar'].map(result_map)
df['foo'] = df['foo'].map(result_map)
df
bar foo
0 4 0
1 1 3
2 3 2
However, if you wish to be on the safe side (assuming a key/value is not in your result_map and you dont want to see a NaN) do this:
df['foo'] = df['foo'].map(lambda x: result_map.get(x, 'not found'))
df['bar'] = df['bar'].map(lambda x: result_map.get(x, 'not found'))
so an out put for this df
bar foo
0 rejected unidentified
1 clear caution
2 caution suspected
3 sdgdg 0000
will result in:
bar foo
0 4 0
1 1 3
2 3 2
3 not found not found
To be extra efficient:
cols = ['foo','bar','other_columns']
for c in cols:
df[c] = df[c].map(lambda x: result_map.get(x, 'not found'))
Lets try stack, map the dict and then unstack
df.stack().to_frame()[0].map(result_map).unstack()
bar foo
0 4 0
1 1 3
2 3 2

SELECT 1 ID and all belonging elements

I try to create a json select query which can give me back the result on next way.
1 row contains 1 main_message_id and belonging messages. (Like the bottom image.) The json format is not a requirement, if its work with other methods, it will be fine.
I store the data as like this:
+-----------------+---------+----------------+
| main_message_id | message | sub_message_id |
+-----------------+---------+----------------+
| 1 | test 1 | 1 |
| 1 | test 2 | 2 |
| 1 | test 3 | 3 |
| 2 | test 4 | 4 |
| 2 | test 5 | 5 |
| 3 | test 6 | 6 |
+-----------------+---------+----------------+
I would like to create a query, which give me back the data as like this:
+-----------------+-----------------------+--+
| main_message_id | message | |
+-----------------+-----------------------+--+
| 1 | {test1}{test2}{test3} | |
| 2 | {test4}{test5}{test6} | |
| 3 | {test7}{test8}{test9} | |
+-----------------+-----------------------+--+
You can use json_agg() for that:
select main_message_id, json_agg(message) as messages
from the_table
group by main_message_id;
Note that {test1}{test2}{test3} is invalid JSON, the above will return a valid JSON array e.g. ["test1", "test2", "test3"]
If you just want a comma separated list, use string_agg();
select main_message_id, string_ag(message, ', ') as messages
from the_table
group by main_message_id;

Insert into table based on rows in same table

I have a table as per below which will contain multiple rows of data - a couple of thousand rows max.
lvl2 | lvl3 | lvl6 | this_rep_cycle | last_rep_cycle | prev_rep_cycle | rowType
================================================================================
ASSET | CURR | FI | 214,060,924,928 | 0 | 0 | 1-Total
ASSET | CURR | FI | 25,199,630,336 | 0 | 0 | 3-Bal
ASSET | CURR | FX | 123,941,472 | 0 | 0 | 1-Total
ASSET | CURR | FX | 0 | 0 | 0 | 3-Bal
What I need to to is inset a new row in the table with the same lv12, vl3, lvl6, but where:
this_rep_cycle = (this_rep_cycle for rowType = '3-Bal' / this_rep_cycle for rowType = '1-total)
last_rep_cycle = (last_rep_cycle for rowType = '3-Bal' / last_rep_cycle for rowType = '1-total)
prev_rep_cycle = (prev_rep_cycle for rowType = '3-Bal' / prev_rep_cycle for rowType = '1-total)
The end result should look like:
lvl2 | lvl3 | lvl6 | this_rep_cycle | last_rep_cycle | prev_rep_cycle | rowType
================================================================================
ASSET | CURR | FI | 214,060,924,928 | 0 | 0 | 1-Total
ASSET | CURR | FI | 25,199,630,336 | 0 | 0 | 3-Bal
ASSET | CURR | FI | 11.77 | 0 | 0 | 4-%
ASSET | CURR | FX | 123,941,472 | 0 | 0 | 1-Total
ASSET | CURR | FX | 0 | 0 | 0 | 3-Bal
ASSET | CURR | FX | 0 | 0 | 0 | 4-%
I have written a self join to achieve this:
set arithignore on
select
pd_1.lvl2, pd_1.lvl3, pd_1.lvl4, pd_1.lvl6, pd_2.last_report_cycle as p2_lrp, pd_1.last_report_cycle as p1_lrp,
(pd_2.this_report_cycle / pd_1.this_report_cycle)*100 as this_report_cycle,
(pd_2.last_report_cycle / pd_1.last_report_cycle)*100 as last_report_cycle
-- (pd_2.prev_report_cycle / pd_1.prev_report_cycle)*100 as prev_report_cycle,
-- '4-%' as [percentage]
from ProxyTrending pd_1
inner join ProxyTrending pd_2 on pd_2.rowType = '3-Bal'
AND pd_1.lvl2 = pd_2.lvl2
AND pd_1.lvl3 = pd_2.lvl3
AND pd_1.lvl4 = pd_2.lvl4
AND pd_1.lvl6 = pd_2.lvl6
where pd_1.rowType = '1-Total'
--order by pd_1.lvl2, pd_1.lvl3, pd_1.lvl4, pd_1.lvl6
set arithignore off
I need set arithignore as I can experience div/zero, but when i execute the above, it (partially) works if only one of the (report_cycle / report_cycle)*100 is uncommented - 2 or more of these lines and zero results are returned.
also, if i have just one of the (report_cycle / report_cycle)*100 uncommented, 60 results are returned where there are 106 '1-total' records and 106 '3-Bal' records - I would have expected the proc to run and return 106 '4-%' results.
I'm not sure what I'm missing.
Well, i have the cause of the error. If I execute:
select pd_2.this_report_cycle, pd_1.this_report_cycle...
the statement executes and returns values. If I place:
select pd_2.this_report_cycle, pd_1.this_report_cycle,
(pd_2.this_report_cycle / pd_1.this_report_cycle)*100 as expected_result
the query return zero rows if either of the values are 0, despite the ARITHIGNORE setting.
And the CASE statement solves the issue.

How to add one column to another column on all rows in a table?

Given a table like below. I want to move and add all values from the column escrow1 to balance1 of it's corresponding uid. And likesise for escrow2 to balance2. So in the case below. The row with uid 4 will have a balance of 1858000+42000, row with uid 3 will have a balance1 = 1859265+30735 and escrow1 = 0, and row with uid 2 will have a balance2 = 940050+1050000 and escrow2 = 0. Everything else is the same. Is it possible to do this in one query? I've been trying hard, but I can't come up with a solution, so I might have to do it in a function and loop all the rows, but I would prefer not to. Also I know that only a small amount of rows will have escrow values not equal to 0. Given that, is there a way to optimize the query?
uid | balance1 | escrow1 | balance2 | escrow2
-----+----------+---------+----------+---------
1 | 5000 | 0 | 0 | 0
9 | 5000 | 0 | 0 | 0
6 | 1900000 | 0 | 1899960 | 0
5 | 1900000 | 0 | 1900000 | 0
7 | 1900000 | 0 | 1900000 | 0
8 | 1900000 | 0 | 1900000 | 0
4 | 1858000 | 42000 | 1900014 | 0
2 | 1910000 | 0 | 940050 | 1050000
3 | 1859265 | 30735 | 1895050 | 0
If you just want to select the data from the table use the query provided by Greg. If you want to update the table itself, the below query can help.
Update TABLENAME
Set Balance1 = Balance1 + Escrow1,
Balance2 = Balance2 + Escrow2,
Escrow1 = 0, Escrow2 = 0
Hope this helps.
I think it's as simple as:
SELECT uid
,Balance1 + Escrow1 AS Balance1
,Balance2 + Escrow2 AS Balance2
FROM TableName
In terms of optomising, I haven't done much with Postgre, but I doubt you'd need to do any optimizing (assuming you have proper primary key, etc. on the table)