absfuyu.extra.da.dadf module
Absfuyu: Data Analysis
Data Analyst DataFrame
Version: 5.1.0 Date updated: 10/03/2025 (dd/mm/yyyy)
- class absfuyu.extra.da.dadf.DADF(data=None, index: Axes | None = None, columns: Axes | None = None, dtype: Dtype | None = None, copy: bool | None = None)[source]
Bases:
ShowAllMethodsMixin
,DataAnalystDataFrameCityMixin
,DataAnalystDataFrameDateMixin
,DataAnalystDataFrameOtherMixin
,DataAnalystDataFrameNAMixin
,DataAnalystDataFrameInfoMixin
,DataAnalystDataFrameRowMethodMixin
,DataAnalystDataFrameColumnMethodMixin
Data Analyst
pd.DataFrame
For a list of extra methods: >>> print(DADF.DADF_METHODS)
- classmethod dadf_help() list[str] [source]
Show all available method of DataAnalystDataFrame
Added in version 3.2.0
Deprecated in version 5.1.0
- classmethod sample_df(size: int = 100) Self [source]
Create sample DataFrame
- Parameters:
size (int) – Number of observations, by default
100
- Returns:
DataFrame with these columns: [number, number_big, number_range, missing_value, text, date]
- Return type:
Self
Example:
>>> DataAnalystDataFrame.sample_df() number number_big number_range missing_value text date 0 -2.089770 785 700 NaN vwnlqoql 2013-11-20 1 -0.526689 182 100 24.0 prjjcvqc 2007-04-13 2 -1.596514 909 900 8.0 cbcpzlac 2023-05-24 3 2.982191 989 900 21.0 ivwqwuvd 2022-04-28 4 1.687803 878 800 NaN aajtncum 2005-10-05 .. ... ... ... ... ... ... 95 -1.295145 968 900 16.0 mgqunkhi 2016-04-12 96 1.296795 255 200 NaN lwvytego 2014-05-10 97 1.440746 297 200 5.0 lqsoykun 2010-04-03 98 0.327702 845 800 NaN leadkvsy 2005-08-05 99 0.556720 981 900 36.0 bozmxixy 2004-02-22 [100 rows x 6 columns]
- class absfuyu.extra.da.dadf.DataAnalystDataFrameColumnMethodMixin(data=None, index: Axes | None = None, columns: Axes | None = None, dtype: Dtype | None = None, copy: bool | None = None)[source]
Bases:
DataAnalystDataFrameBase
Data Analyst
pd.DataFrame
- Column methodRearrange rightmost column
Drop columns
Drop rightmost column
Add blank column
- rearrange_rightmost_column(insert_to_col: str, num_of_cols: int = 1) Self [source]
Move right-most columns to selected position
- Parameters:
insert_to_col (str) – Name of the column that the right-most column will be moved next to
num_of_cols (int) – Number of columns moved, by default
1
- Returns:
Modified DataFrame
- Return type:
Self
Example:
>>> df = DADF.sample_df(2) >>> df number number_big number_range missing_value text date 0 -1.583590 756 700 NaN eqklyckc 2023-05-20 1 0.203968 167 100 NaN wzrsxinb 2011-02-27 >>> df.rearrange_rightmost_column("number") number date number_big number_range missing_value text 0 -1.583590 2023-05-20 756 700 NaN eqklyckc 1 0.203968 2011-02-27 167 100 NaN wzrsxinb
- drop_columns(columns: Sequence[str]) Self [source]
Drop columns in DataFrame
- Parameters:
columns (Iterable[str]) – List of columns need to drop
- Returns:
Modified DataFrame
- Return type:
Self
Example:
>>> df = DADF.sample_df(2) >>> df number number_big number_range missing_value text date 0 -0.283019 666 600 NaN ztoeeblx 2022-11-13 1 1.194725 939 900 NaN fxardqvh 2005-08-04 >>> df.drop_columns(["date", "text"]) number number_big number_range missing_value 0 -0.283019 666 600 NaN 1 1.194725 939 900 NaN
- drop_rightmost(num_of_cols: int = 1) Self [source]
Drop
num_of_cols
right-most columns- Parameters:
num_of_cols (int) – Number of columns to drop
- Returns:
Modified DataFrame
- Return type:
Self
Example:
>>> df = DADF.sample_df(2) >>> df number number_big number_range missing_value text date 0 0.851953 572 500 5 ncpbnzef 2020-08-15 1 0.381643 595 500 53 iojogbgj 2011-12-04 >>> df.drop_rightmost(5) number 0 0.851953 1 0.381643
- add_blank_column(column_name: str, fill: Any = nan, /) Self [source]
Add a blank column
- Parameters:
column_name (str) – Name of the column to add
fill (Any) – Fill the column with data
- Returns:
Modified DataFrame
- Return type:
Self
Deprecated in version 5.1.0: Use pd.DataFrame.assign(…) method instead
- class absfuyu.extra.da.dadf.DataAnalystDataFrameRowMethodMixin(data=None, index: Axes | None = None, columns: Axes | None = None, dtype: Dtype | None = None, copy: bool | None = None)[source]
Bases:
DataAnalystDataFrameBase
Data Analyst
pd.DataFrame
- Row methodGet different rows
- get_different_rows(other: Self | DataFrame) Self [source]
Subtract DataFrame to find the different rows
- Parameters:
other (Self | pd.DataFrame) – DataFrame to subtract
- Returns:
Different row DataFrame
- Return type:
Self
Example:
>>> df1 = DADF({"A": [1, 2, 3, 4], "B": [5, 6, 7, 8]}) >>> df2 = DADF({"A": [1, 2, 3, 4], "B": [7, 6, 6, 8]}) >>> df1.get_different_rows(df2) A B 0 1 7 2 3 6
Added in version 4.0.0
- class absfuyu.extra.da.dadf.DataAnalystDataFrameInfoMixin(data=None, index: Axes | None = None, columns: Axes | None = None, dtype: Dtype | None = None, copy: bool | None = None)[source]
Bases:
DataAnalystDataFrameBase
Data Analyst
pd.DataFrame
- InfoQuick info
Quick describe
Show distribution
Threshold filter
- qinfo() str [source]
Show quick infomation about DataFrame
Example:
>>> DADF.sample_df().qinfo() Dataset Information: - Number of Rows: 100 - Number of Columns: 6 - Total observation: 600 - Missing value: 13 (2.17%)
Column names: [‘number’, ‘number_big’, ‘number_range’, ‘missing_value’, ‘text’, ‘date’]
Added in version 3.2.0
- describe(percentiles=None, include=None, exclude=None) Self [source]
pd.DataFrame.describe() override
- qdescribe() Self [source]
Quick
describe()
that excludeobject
anddatetime
dtype- Returns:
Modified DataFrame
- Return type:
Self
Example:
>>> DADF.sample_df().qdescribe() number number_big missing_value count 100.000000 100.000000 48.000000 mean -0.052935 586.750000 22.916667 std 0.954170 237.248596 11.987286 min -2.392952 105.000000 3.000000 25% -0.738311 407.500000 13.000000 50% -0.068014 607.000000 23.500000 75% 0.614025 790.250000 36.000000 max 2.512533 988.000000 42.000000
Added in version 3.2.0
- show_distribution(column_name: str, dropna: bool = True, *, show_percentage: bool = True, percentage_round_up: int = 2) Self [source]
Show distribution of a column
- Parameters:
column_name (str) – Column to show distribution
dropna (bool) – Count N/A when
False
(Default:True
)show_percentage (bool) – Show proportion in range 0% - 100% instead of [0, 1] (Default:
True
)percentage_round_up (int) – Round up to which decimals (Default:
2
)
- Returns:
Distribution DataFrame
- Return type:
Self
Example:
>>> DADF.sample_df().show_distribution("number_range") number_range count percentage 0 900 16 16.0 1 700 15 15.0 2 300 12 12.0 3 200 12 12.0 4 400 11 11.0 5 600 11 11.0 6 800 10 10.0 7 100 9 9.0 8 500 4 4.0
Added in version 3.2.0
- threshold_filter(destination_column: str, threshold: int | float = 10, *, top: int | None = None, replace_with: Any = 'Other') Self [source]
Filter out percentage of data that smaller than the
threshold
, replace all of the smaller data toreplace_with
. As a result, pie chart is less messy.- Parameters:
destination_column (str) – Column to be filtered
threshold (int | float) – Which percentage to cut-off (Default: 10%)
top (int) – Only show top
x
categories in pie chart (replace threshold mode) (Default:None
)replace_with (Any) – Replace all of the smaller data with specified value
- Returns:
Modified DataFrame
- Return type:
Self
Deprecated in version 5.1.0: Rework THIS
- class absfuyu.extra.da.dadf.DataAnalystDataFrameNAMixin(data=None, index: Axes | None = None, columns: Axes | None = None, dtype: Dtype | None = None, copy: bool | None = None)[source]
Bases:
DataAnalystDataFrameBase
Data Analyst
pd.DataFrame
- Missing valueFill missing values
Get missing values
Split N/A
Apply not null
Apply not null row
- fill_missing_values(column_name: str, fill: Any = nan, *, fill_when_not_exist: Any = nan) Self [source]
Fill missing values in specified column
- Parameters:
column_name (str) – Column name
fill (Any) – Fill the missing values with, by default
np.nan
fill_when_not_exist (Any) – When
column_name
does not exist, create a new column and fill withfill_when_not_exist
, by defaultnp.nan
- Returns:
Modified DataFrame
- Return type:
Self
Example:
>>> df = DADF.sample_df(2) >>> df number number_big number_range missing_value text date 0 0.174303 926 900 NaN tenkiakh 2006-09-08 1 0.305137 140 100 NaN jzuddamf 2012-04-04 >>> df.fill_missing_values("missing_value", 0) number number_big number_range missing_value text date 0 0.174303 926 900 0.0 tenkiakh 2006-09-08 1 0.305137 140 100 0.0 jzuddamf 2012-04-04 >>> df.fill_missing_values("missing_column", 0, fill_when_not_exist=0) number number_big number_range missing_value text date missing_column 0 0.174303 926 900 0.0 tenkiakh 2006-09-08 0 1 0.305137 140 100 0.0 jzuddamf 2012-04-04 0
- get_missing_values(hightlight: bool = True, *, percentage_round_up: int = 2) Self [source]
Get a DataFrame contains count of missing values for each column
- Parameters:
hightlight (bool) – Shows only columns with missing values when
True
, by defaultTrue
percentage_round_up (int) – Round up to which decimals, by default
2
- Returns:
Missing value DataFrame
- Return type:
Self
Example:
>>> DADF.sample_df(152).get_missing_values() Num of N/A Percentage missing_value 42 27.63
- split_na(by_column: str) SplittedDF [source]
- Split DataFrame into 2 parts:
Without missing value in specified column
With missing value in specified column
- Parameters:
by_column (str) – Split by column
- Returns:
Splitted DataFrame
- Return type:
Example:
>>> DADF.sample_df(10).split_na("missing_value") SplittedDF( df= number number_big number_range missing_value text date 0 0.643254 690 600 3.0 cinvofwj 2018-08-15 2 0.499345 255 200 13.0 jasifzez 2005-06-01 3 -1.727036 804 800 38.0 esxjmger 2009-07-24 4 0.873058 690 600 32.0 htewfpld 2022-07-22 5 -2.389884 442 400 30.0 hbcnfogu 2006-02-25 8 0.264584 432 400 2.0 ejbvbmwn 2013-05-11 9 0.813655 137 100 20.0 oecttada 2024-11-22, df_na= number number_big number_range missing_value text date 1 -0.411354 363 300 NaN juzecani 2014-12-02 6 -0.833857 531 500 NaN ybnntryh 2023-11-03 7 1.355589 472 400 NaN zjltghjr 2024-10-09 )
Added in version 3.1.0
- apply_notnull(col: str, callable: Callable[[Any], _R]) Self [source]
Only apply callable to not NaN value in column
- Parameters:
col (str) – Column to apply
callable (Callable[[Any], _R]) – Callable
- Returns:
Applied DataFrame
- Return type:
Self
Example:
>>> DADF.sample_df(5).apply_notnull("missing_value", lambda _: "REPLACED") number number_big number_range missing_value text date 0 0.852218 157 100 REPLACED dqzxaxxs 2006-03-08 1 1.522428 616 600 NaN mivkaooe 2018-12-27 2 0.108506 745 700 REPLACED qanwwjet 2005-07-14 3 -1.435079 400 400 REPLACED ywahcasi 2024-05-20 4 0.118993 861 800 REPLACED saoupuby 2019-04-28
Added in version 5.1.0
- apply_notnull_row(apply_when_null: Callable[[Any], _R] | _T | None = None, apply_when_not_null: Callable[[Any], _R] | _T | None = None, col_name: str | None = None) Self [source]
Apply to DataFrame’s row with missing value.
- Parameters:
apply_when_null (Callable[[Any], R] | T | None, optional) – Callable or Any, by default
None
: returns if entire row is not nullapply_when_not_null (Callable[[Any], R] | T | None, optional) – Callable or Any, by default
None
: returns if entire row is not nullcol_name (str | None, optional) – Output column name, by default
None
(uses custom name)
- Returns:
Modified DataDrame
- Return type:
Self
Example:
>>> df = DADF({"A": [None, 2, 3, 4], "B": [1, None, 3, 4], "C": [None, 2, None, 4]}) >>> df.apply_notnull_row() A B C applied_row_null 0 NaN 1.0 NaN False 1 2.0 NaN 2.0 False 2 3.0 3.0 NaN False 3 4.0 4.0 4.0 True >>> df.apply_notnull_row(0, 1) A B C applied_row_null 0 NaN 1.0 NaN 0 1 2.0 NaN 2.0 0 2 3.0 3.0 NaN 0 3 4.0 4.0 4.0 1 >>> df.apply_notnull_row(lambda _: "n", lambda _: "y", col_name="mod") A B C mod 0 NaN 1.0 NaN n 1 2.0 NaN 2.0 n 2 3.0 3.0 NaN n 3 4.0 4.0 4.0 y
Added in version 5.1.0
- class absfuyu.extra.da.dadf.DataAnalystDataFrameOtherMixin(data=None, index: Axes | None = None, columns: Axes | None = None, dtype: Dtype | None = None, copy: bool | None = None)[source]
Bases:
DataAnalystDataFrameBase
Data Analyst
pd.DataFrame
- Other method/StuffMerge left
- merge_left(other: Self | DataFrame, on: str, columns: list[str] | None = None) Self [source]
Merge left of 2 DataFrame
- Parameters:
other (Self | pd.DataFrame) – DataFrame to merge
on (str) – Merge on which column
columns (list[str] | None, optional) – Columns to take from other DataFrame, by default
None
(Take all columns)
- Returns:
Merged DataFrame
- Return type:
Self
Example:
>>> df1 = DADF({ ... "id": [1, 2, 5], ... "name": ["Alice", "Bob", "Rich"], ... "age": [20, 20, 20], ... }) >>> df2 = DADF({ ... "id": [1, 2, 3], ... "age": [25, 30, 45], ... "department": ["HR", "IT", "PM"], ... "salary": [50000, 60000, 55000], ... }) >>> df1.merge_left(df2, on="id") id name age_x age_y department salary 0 1 Alice 20 25.0 HR 50000.0 1 2 Bob 20 30.0 IT 60000.0 2 5 Rich 20 NaN NaN NaN >>> df1.merge_left(df2, on="id", columns=["salary"]) id name age department salary 0 1 Alice 25.0 HR 50000.0 1 2 Bob 30.0 IT 60000.0 2 5 Rich NaN NaN NaN
Added in version 4.0.0
- class absfuyu.extra.da.dadf.DataAnalystDataFrameDateMixin(data=None, index: Axes | None = None, columns: Axes | None = None, dtype: Dtype | None = None, copy: bool | None = None)[source]
Bases:
DataAnalystDataFrameBase
Data Analyst
pd.DataFrame
- DateAdd date column from month column
Add detail date
Delta date (How many days inbetween)
- add_date_from_month(month_column: str, *, col_name: str = 'date') Self [source]
Add dummy
date
column frommonth
column- Parameters:
month_column (str) – Month column
col_name (str) – New date column name, by default:
"date"
- Returns:
Modified DataFrame
- Return type:
Self
Example:
>>> df = ( ... DADF.sample_df(2) ... .add_detail_date("date", mode="m") ... .drop_columns(["date", "number", "number_range"]) ... ) >>> df number_big missing_value text month 0 755 NaN lincgqzl 4 1 907 NaN gxltrjku 10 >>> df.add_date_from_month("month") number_big missing_value text month date 0 755 NaN lincgqzl 4 2025-04-01 1 907 NaN gxltrjku 10 2025-10-01
- add_detail_date(date_column: str, mode: str = 'dwmy') Self [source]
- Add these columns from
date_column
: date
(won’t add ifdate_column
value is"date"
)day
(overwrite if already exist)week
(overwrite if already exist)month
(overwrite if already exist)year
(overwrite if already exist)
- Parameters:
date_column (str) – Date column
mode (str) –
Detailed column to addd
: dayw
: week numberm
: monthy
: year(Default:"dwmy"
)
- Returns:
Modified DataFrame
- Return type:
Self
Example:
>>> df = DADF.sample_df(2) >>> df number number_big number_range missing_value text date 0 0.331195 902 900 20 fgyanxik 2021-10-18 1 -0.877727 378 300 13 dqvaggjo 2007-03-06 >>> df.add_detail_date("date") number number_big number_range missing_value text date day week month year 0 0.331195 902 900 20 fgyanxik 2021-10-18 18 42 10 2021 1 -0.877727 378 300 13 dqvaggjo 2007-03-06 6 10 3 2007
- Add these columns from
- delta_date(date_column: str, mode: Literal['now', 'between_row'] = 'now', *, col_name: str = 'delta_date') Self [source]
Calculate date interval
- Parameters:
date_column (str) – Date column
mode (str) –
Mode to calculate"between_row"
: Calculate date interval between each row"now"
: Calculate date interval to current date(Default:"now"
)col_name (str) –
New delta date column name(Default:"delta_date"
)
- Returns:
Modified DataFrame
- Return type:
Self
Example:
>>> df = DADF.sample_df(2) >>> df number number_big number_range missing_value text date 0 -0.729988 435 400 21 xkrqqouf 2014-08-01 1 -0.846031 210 200 5 rbkmiqxt 2024-07-10 >>> df.delta_date("date") number number_big number_range missing_value text date delta_date 0 -0.729988 435 400 21 xkrqqouf 2014-08-01 3873 1 -0.846031 210 200 5 rbkmiqxt 2024-07-10 242
- class absfuyu.extra.da.dadf.DataAnalystDataFrameCityMixin(data=None, index: Axes | None = None, columns: Axes | None = None, dtype: Dtype | None = None, copy: bool | None = None)[source]
Bases:
DataAnalystDataFrameBase
Data Analyst
pd.DataFrame
- CityConvert city
- convert_city(city_column: str, city_list: list[CityData], *, mode: str = 'ra') Self [source]
Get
region
andarea
of a city- Parameters:
city_column (str) – Column contains city data
city_list (list[CityData]) – List of city in correct format (Default:
None
)mode (str) –
Detailed column to addr
: regiona
: area(Default:"ra"
)
- Returns:
Modified DataFrame
- Return type:
DataAnalystDataFrame