absfuyu.extra.da.dadf module

Absfuyu: Data Analysis

Data Analyst DataFrame

Version: 5.1.0 Date updated: 10/03/2025 (dd/mm/yyyy)

class absfuyu.extra.da.dadf.DADF(data=None, index: Axes | None = None, columns: Axes | None = None, dtype: Dtype | None = None, copy: bool | None = None)[source]

Bases: ShowAllMethodsMixin, DataAnalystDataFrameCityMixin, DataAnalystDataFrameDateMixin, DataAnalystDataFrameOtherMixin, DataAnalystDataFrameNAMixin, DataAnalystDataFrameInfoMixin, DataAnalystDataFrameRowMethodMixin, DataAnalystDataFrameColumnMethodMixin

Data Analyst pd.DataFrame

For a list of extra methods: >>> print(DADF.DADF_METHODS)

classmethod dadf_help() list[str][source]

Show all available method of DataAnalystDataFrame

Added in version 3.2.0

Deprecated in version 5.1.0

classmethod sample_df(size: int = 100) Self[source]

Create sample DataFrame

Parameters:

size (int) – Number of observations, by default 100

Returns:

DataFrame with these columns: [number, number_big, number_range, missing_value, text, date]

Return type:

Self

Example:

>>> DataAnalystDataFrame.sample_df()
      number  number_big number_range  missing_value      text       date
0  -2.089770         785          700            NaN  vwnlqoql 2013-11-20
1  -0.526689         182          100           24.0  prjjcvqc 2007-04-13
2  -1.596514         909          900            8.0  cbcpzlac 2023-05-24
3   2.982191         989          900           21.0  ivwqwuvd 2022-04-28
4   1.687803         878          800            NaN  aajtncum 2005-10-05
..       ...         ...          ...            ...       ...        ...
95 -1.295145         968          900           16.0  mgqunkhi 2016-04-12
96  1.296795         255          200            NaN  lwvytego 2014-05-10
97  1.440746         297          200            5.0  lqsoykun 2010-04-03
98  0.327702         845          800            NaN  leadkvsy 2005-08-05
99  0.556720         981          900           36.0  bozmxixy 2004-02-22
[100 rows x 6 columns]
class absfuyu.extra.da.dadf.DataAnalystDataFrameColumnMethodMixin(data=None, index: Axes | None = None, columns: Axes | None = None, dtype: Dtype | None = None, copy: bool | None = None)[source]

Bases: DataAnalystDataFrameBase

Data Analyst pd.DataFrame - Column method

  • Rearrange rightmost column

  • Drop columns

  • Drop rightmost column

  • Add blank column

rearrange_rightmost_column(insert_to_col: str, num_of_cols: int = 1) Self[source]

Move right-most columns to selected position

Parameters:
  • insert_to_col (str) – Name of the column that the right-most column will be moved next to

  • num_of_cols (int) – Number of columns moved, by default 1

Returns:

Modified DataFrame

Return type:

Self

Example:

>>> df = DADF.sample_df(2)
>>> df
     number  number_big number_range  missing_value      text       date
0 -1.583590         756          700            NaN  eqklyckc 2023-05-20
1  0.203968         167          100            NaN  wzrsxinb 2011-02-27
>>> df.rearrange_rightmost_column("number")
     number       date  number_big number_range  missing_value      text
0 -1.583590 2023-05-20         756          700            NaN  eqklyckc
1  0.203968 2011-02-27         167          100            NaN  wzrsxinb
drop_columns(columns: Sequence[str]) Self[source]

Drop columns in DataFrame

Parameters:

columns (Iterable[str]) – List of columns need to drop

Returns:

Modified DataFrame

Return type:

Self

Example:

>>> df = DADF.sample_df(2)
>>> df
     number  number_big number_range  missing_value      text       date
0 -0.283019         666          600            NaN  ztoeeblx 2022-11-13
1  1.194725         939          900            NaN  fxardqvh 2005-08-04
>>> df.drop_columns(["date", "text"])
     number  number_big number_range  missing_value
0 -0.283019         666          600            NaN
1  1.194725         939          900            NaN
drop_rightmost(num_of_cols: int = 1) Self[source]

Drop num_of_cols right-most columns

Parameters:

num_of_cols (int) – Number of columns to drop

Returns:

Modified DataFrame

Return type:

Self

Example:

>>> df = DADF.sample_df(2)
>>> df
     number  number_big number_range  missing_value      text       date
0  0.851953         572          500              5  ncpbnzef 2020-08-15
1  0.381643         595          500             53  iojogbgj 2011-12-04
>>> df.drop_rightmost(5)
     number
0  0.851953
1  0.381643
add_blank_column(column_name: str, fill: Any = nan, /) Self[source]

Add a blank column

Parameters:
  • column_name (str) – Name of the column to add

  • fill (Any) – Fill the column with data

Returns:

Modified DataFrame

Return type:

Self

Deprecated in version 5.1.0: Use pd.DataFrame.assign(…) method instead

class absfuyu.extra.da.dadf.DataAnalystDataFrameRowMethodMixin(data=None, index: Axes | None = None, columns: Axes | None = None, dtype: Dtype | None = None, copy: bool | None = None)[source]

Bases: DataAnalystDataFrameBase

Data Analyst pd.DataFrame - Row method

  • Get different rows

get_different_rows(other: Self | DataFrame) Self[source]

Subtract DataFrame to find the different rows

Parameters:

other (Self | pd.DataFrame) – DataFrame to subtract

Returns:

Different row DataFrame

Return type:

Self

Example:

>>> df1 = DADF({"A": [1, 2, 3, 4], "B": [5, 6, 7, 8]})
>>> df2 = DADF({"A": [1, 2, 3, 4], "B": [7, 6, 6, 8]})
>>> df1.get_different_rows(df2)
   A  B
0  1  7
2  3  6

Added in version 4.0.0

class absfuyu.extra.da.dadf.DataAnalystDataFrameInfoMixin(data=None, index: Axes | None = None, columns: Axes | None = None, dtype: Dtype | None = None, copy: bool | None = None)[source]

Bases: DataAnalystDataFrameBase

Data Analyst pd.DataFrame - Info

  • Quick info

  • Quick describe

  • Show distribution

  • Threshold filter

qinfo() str[source]

Show quick infomation about DataFrame

Example:

>>> DADF.sample_df().qinfo()
Dataset Information:
- Number of Rows: 100
- Number of Columns: 6
- Total observation: 600
- Missing value: 13 (2.17%)

Column names: [‘number’, ‘number_big’, ‘number_range’, ‘missing_value’, ‘text’, ‘date’]

Added in version 3.2.0

describe(percentiles=None, include=None, exclude=None) Self[source]

pd.DataFrame.describe() override

qdescribe() Self[source]

Quick describe() that exclude object and datetime dtype

Returns:

Modified DataFrame

Return type:

Self

Example:

>>> DADF.sample_df().qdescribe()
           number  number_big  missing_value
count  100.000000  100.000000      48.000000
mean    -0.052935  586.750000      22.916667
std      0.954170  237.248596      11.987286
min     -2.392952  105.000000       3.000000
25%     -0.738311  407.500000      13.000000
50%     -0.068014  607.000000      23.500000
75%      0.614025  790.250000      36.000000
max      2.512533  988.000000      42.000000

Added in version 3.2.0

show_distribution(column_name: str, dropna: bool = True, *, show_percentage: bool = True, percentage_round_up: int = 2) Self[source]

Show distribution of a column

Parameters:
  • column_name (str) – Column to show distribution

  • dropna (bool) – Count N/A when False (Default: True)

  • show_percentage (bool) – Show proportion in range 0% - 100% instead of [0, 1] (Default: True)

  • percentage_round_up (int) – Round up to which decimals (Default: 2)

Returns:

Distribution DataFrame

Return type:

Self

Example:

>>> DADF.sample_df().show_distribution("number_range")
  number_range  count  percentage
0          900     16        16.0
1          700     15        15.0
2          300     12        12.0
3          200     12        12.0
4          400     11        11.0
5          600     11        11.0
6          800     10        10.0
7          100      9         9.0
8          500      4         4.0

Added in version 3.2.0

threshold_filter(destination_column: str, threshold: int | float = 10, *, top: int | None = None, replace_with: Any = 'Other') Self[source]

Filter out percentage of data that smaller than the threshold, replace all of the smaller data to replace_with. As a result, pie chart is less messy.

Parameters:
  • destination_column (str) – Column to be filtered

  • threshold (int | float) – Which percentage to cut-off (Default: 10%)

  • top (int) – Only show top x categories in pie chart (replace threshold mode) (Default: None)

  • replace_with (Any) – Replace all of the smaller data with specified value

Returns:

Modified DataFrame

Return type:

Self

Deprecated in version 5.1.0: Rework THIS

class absfuyu.extra.da.dadf.DataAnalystDataFrameNAMixin(data=None, index: Axes | None = None, columns: Axes | None = None, dtype: Dtype | None = None, copy: bool | None = None)[source]

Bases: DataAnalystDataFrameBase

Data Analyst pd.DataFrame - Missing value

  • Fill missing values

  • Get missing values

  • Split N/A

  • Apply not null

  • Apply not null row

fill_missing_values(column_name: str, fill: Any = nan, *, fill_when_not_exist: Any = nan) Self[source]

Fill missing values in specified column

Parameters:
  • column_name (str) – Column name

  • fill (Any) – Fill the missing values with, by default np.nan

  • fill_when_not_exist (Any) – When column_name does not exist, create a new column and fill with fill_when_not_exist, by default np.nan

Returns:

Modified DataFrame

Return type:

Self

Example:

>>> df = DADF.sample_df(2)
>>> df
     number  number_big number_range  missing_value      text       date
0  0.174303         926          900            NaN  tenkiakh 2006-09-08
1  0.305137         140          100            NaN  jzuddamf 2012-04-04
>>> df.fill_missing_values("missing_value", 0)
     number  number_big number_range  missing_value      text       date
0  0.174303         926          900            0.0  tenkiakh 2006-09-08
1  0.305137         140          100            0.0  jzuddamf 2012-04-04
>>> df.fill_missing_values("missing_column", 0, fill_when_not_exist=0)
     number  number_big number_range  missing_value      text       date  missing_column
0  0.174303         926          900            0.0  tenkiakh 2006-09-08               0
1  0.305137         140          100            0.0  jzuddamf 2012-04-04               0
get_missing_values(hightlight: bool = True, *, percentage_round_up: int = 2) Self[source]

Get a DataFrame contains count of missing values for each column

Parameters:
  • hightlight (bool) – Shows only columns with missing values when True, by default True

  • percentage_round_up (int) – Round up to which decimals, by default 2

Returns:

Missing value DataFrame

Return type:

Self

Example:

>>> DADF.sample_df(152).get_missing_values()
               Num of N/A  Percentage
missing_value          42       27.63
split_na(by_column: str) SplittedDF[source]
Split DataFrame into 2 parts:
  • Without missing value in specified column

  • With missing value in specified column

Parameters:

by_column (str) – Split by column

Returns:

Splitted DataFrame

Return type:

SplittedDF

Example:

>>> DADF.sample_df(10).split_na("missing_value")
SplittedDF(
    df=     number  number_big number_range  missing_value      text       date
0         0.643254         690          600            3.0  cinvofwj 2018-08-15
2         0.499345         255          200           13.0  jasifzez 2005-06-01
3        -1.727036         804          800           38.0  esxjmger 2009-07-24
4         0.873058         690          600           32.0  htewfpld 2022-07-22
5        -2.389884         442          400           30.0  hbcnfogu 2006-02-25
8         0.264584         432          400            2.0  ejbvbmwn 2013-05-11
9         0.813655         137          100           20.0  oecttada 2024-11-22,
    df_na=     number  number_big number_range  missing_value      text       date
1           -0.411354         363          300            NaN  juzecani 2014-12-02
6           -0.833857         531          500            NaN  ybnntryh 2023-11-03
7            1.355589         472          400            NaN  zjltghjr 2024-10-09
)

Added in version 3.1.0

apply_notnull(col: str, callable: Callable[[Any], _R]) Self[source]

Only apply callable to not NaN value in column

Parameters:
  • col (str) – Column to apply

  • callable (Callable[[Any], _R]) – Callable

Returns:

Applied DataFrame

Return type:

Self

Example:

>>> DADF.sample_df(5).apply_notnull("missing_value", lambda _: "REPLACED")
     number  number_big number_range missing_value      text       date
0  0.852218         157          100      REPLACED  dqzxaxxs 2006-03-08
1  1.522428         616          600           NaN  mivkaooe 2018-12-27
2  0.108506         745          700      REPLACED  qanwwjet 2005-07-14
3 -1.435079         400          400      REPLACED  ywahcasi 2024-05-20
4  0.118993         861          800      REPLACED  saoupuby 2019-04-28

Added in version 5.1.0

apply_notnull_row(apply_when_null: Callable[[Any], _R] | _T | None = None, apply_when_not_null: Callable[[Any], _R] | _T | None = None, col_name: str | None = None) Self[source]

Apply to DataFrame’s row with missing value.

Parameters:
  • apply_when_null (Callable[[Any], R] | T | None, optional) – Callable or Any, by default None: returns if entire row is not null

  • apply_when_not_null (Callable[[Any], R] | T | None, optional) – Callable or Any, by default None: returns if entire row is not null

  • col_name (str | None, optional) – Output column name, by default None (uses custom name)

Returns:

Modified DataDrame

Return type:

Self

Example:

>>> df = DADF({"A": [None, 2, 3, 4], "B": [1, None, 3, 4], "C": [None, 2, None, 4]})
>>> df.apply_notnull_row()
     A    B    C  applied_row_null
0  NaN  1.0  NaN             False
1  2.0  NaN  2.0             False
2  3.0  3.0  NaN             False
3  4.0  4.0  4.0              True
>>> df.apply_notnull_row(0, 1)
     A    B    C  applied_row_null
0  NaN  1.0  NaN                 0
1  2.0  NaN  2.0                 0
2  3.0  3.0  NaN                 0
3  4.0  4.0  4.0                 1
>>> df.apply_notnull_row(lambda _: "n", lambda _: "y", col_name="mod")
     A    B    C mod
0  NaN  1.0  NaN   n
1  2.0  NaN  2.0   n
2  3.0  3.0  NaN   n
3  4.0  4.0  4.0   y

Added in version 5.1.0

class absfuyu.extra.da.dadf.DataAnalystDataFrameOtherMixin(data=None, index: Axes | None = None, columns: Axes | None = None, dtype: Dtype | None = None, copy: bool | None = None)[source]

Bases: DataAnalystDataFrameBase

Data Analyst pd.DataFrame - Other method/Stuff

  • Merge left

merge_left(other: Self | DataFrame, on: str, columns: list[str] | None = None) Self[source]

Merge left of 2 DataFrame

Parameters:
  • other (Self | pd.DataFrame) – DataFrame to merge

  • on (str) – Merge on which column

  • columns (list[str] | None, optional) – Columns to take from other DataFrame, by default None (Take all columns)

Returns:

Merged DataFrame

Return type:

Self

Example:

>>> df1 = DADF({
...     "id": [1, 2, 5],
...     "name": ["Alice", "Bob", "Rich"],
...     "age": [20, 20, 20],
... })
>>> df2 = DADF({
...     "id": [1, 2, 3],
...     "age": [25, 30, 45],
...     "department": ["HR", "IT", "PM"],
...     "salary": [50000, 60000, 55000],
... })
>>> df1.merge_left(df2, on="id")
   id   name  age_x  age_y department   salary
0   1  Alice     20   25.0         HR  50000.0
1   2    Bob     20   30.0         IT  60000.0
2   5   Rich     20    NaN        NaN      NaN
>>> df1.merge_left(df2, on="id", columns=["salary"])
   id   name   age department   salary
0   1  Alice  25.0         HR  50000.0
1   2    Bob  30.0         IT  60000.0
2   5   Rich   NaN        NaN      NaN

Added in version 4.0.0

class absfuyu.extra.da.dadf.DataAnalystDataFrameDateMixin(data=None, index: Axes | None = None, columns: Axes | None = None, dtype: Dtype | None = None, copy: bool | None = None)[source]

Bases: DataAnalystDataFrameBase

Data Analyst pd.DataFrame - Date

  • Add date column from month column

  • Add detail date

  • Delta date (How many days inbetween)

add_date_from_month(month_column: str, *, col_name: str = 'date') Self[source]

Add dummy date column from month column

Parameters:
  • month_column (str) – Month column

  • col_name (str) – New date column name, by default: "date"

Returns:

Modified DataFrame

Return type:

Self

Example:

>>> df = (
...     DADF.sample_df(2)
...     .add_detail_date("date", mode="m")
...     .drop_columns(["date", "number", "number_range"])
... )
>>> df
   number_big  missing_value      text  month
0         755            NaN  lincgqzl      4
1         907            NaN  gxltrjku     10
>>> df.add_date_from_month("month")
   number_big  missing_value      text  month       date
0         755            NaN  lincgqzl      4 2025-04-01
1         907            NaN  gxltrjku     10 2025-10-01
add_detail_date(date_column: str, mode: str = 'dwmy') Self[source]
Add these columns from date_column:
  • date (won’t add if date_column value is "date")

  • day (overwrite if already exist)

  • week (overwrite if already exist)

  • month (overwrite if already exist)

  • year (overwrite if already exist)

Parameters:
  • date_column (str) – Date column

  • mode (str) –

    Detailed column to add
    d: day
    w: week number
    m: month
    y: year
    (Default: "dwmy")

Returns:

Modified DataFrame

Return type:

Self

Example:

>>> df = DADF.sample_df(2)
>>> df
     number  number_big number_range  missing_value      text       date
0  0.331195         902          900             20  fgyanxik 2021-10-18
1 -0.877727         378          300             13  dqvaggjo 2007-03-06
>>> df.add_detail_date("date")
     number  number_big number_range  missing_value      text       date  day  week  month  year
0  0.331195         902          900             20  fgyanxik 2021-10-18   18    42     10  2021
1 -0.877727         378          300             13  dqvaggjo 2007-03-06    6    10      3  2007
delta_date(date_column: str, mode: Literal['now', 'between_row'] = 'now', *, col_name: str = 'delta_date') Self[source]

Calculate date interval

Parameters:
  • date_column (str) – Date column

  • mode (str) –

    Mode to calculate
    "between_row": Calculate date interval between each row
    "now": Calculate date interval to current date
    (Default: "now")

  • col_name (str) –

    New delta date column name
    (Default: "delta_date")

Returns:

Modified DataFrame

Return type:

Self

Example:

>>> df = DADF.sample_df(2)
>>> df
     number  number_big number_range  missing_value      text       date
0 -0.729988         435          400             21  xkrqqouf 2014-08-01
1 -0.846031         210          200              5  rbkmiqxt 2024-07-10
>>> df.delta_date("date")
     number  number_big number_range  missing_value      text       date  delta_date
0 -0.729988         435          400             21  xkrqqouf 2014-08-01        3873
1 -0.846031         210          200              5  rbkmiqxt 2024-07-10         242
class absfuyu.extra.da.dadf.DataAnalystDataFrameCityMixin(data=None, index: Axes | None = None, columns: Axes | None = None, dtype: Dtype | None = None, copy: bool | None = None)[source]

Bases: DataAnalystDataFrameBase

Data Analyst pd.DataFrame - City

  • Convert city

convert_city(city_column: str, city_list: list[CityData], *, mode: str = 'ra') Self[source]

Get region and area of a city

Parameters:
  • city_column (str) – Column contains city data

  • city_list (list[CityData]) – List of city in correct format (Default: None)

  • mode (str) –

    Detailed column to add
    r: region
    a: area
    (Default: "ra")

Returns:

Modified DataFrame

Return type:

DataAnalystDataFrame