Skip to content

Releases: caleb531/imessage-conversation-analyzer

v2.6.0

02 Feb 00:34
0e1ee3c
Compare
Choose a tag to compare

New Features

  • You can now filter any analyzer by date and participant
    • These are available in the CLI via new --from-date, --to-date, and --from-person flags
    • These are also available in the Python API via new input parameters to ica.get_dataframes(): from_date, to_date, and from_person
    • See the README for details on how to use these new filters
  • Added support for iOS 18's emoji-based reactions that allow for reacting with any arbitrary emoji
    • This new support is mainly reflected in the Reactions metrics for the message_totals analyzer

Fixes

  • Fixed some incorrect logic for how YouTube, Spotify, and Apple Music links were counted within the attachment_totals analyzer

Housekeeping

  • Added missing documentation for the count_phrases analyzer to the README
  • Other organizational tweaks and improvements to the README

v2.5.0

03 Jan 18:55
bbe5dbe
Compare
Choose a tag to compare

New Features

  • Added a new (built-in) count_phrases analyzer which allow you to count the number of case-insensitive occurrences of any arbitrary strings across all messages in a conversation (excluding reactions)
    • e.g. ica -c count_phrases -c 'Jane Fernbrook' 'i love you'
  • Added a new prettify_index parameter to the ica.output_results function; if you specify it with a value of False, it will disable the default behavior of titleizing index values (see the new count_phrases analyzer for an example)

Deprecations

  • The get_cli_args() function has been deprecated in favor of the new get_cli_parser() method
    • The get_cli_parser() function gives you access to the underlying argparse.ArgumentParser instance, allowing you to add new CLI arguments specific to your analyzer
    • To migrate, replace ica.get_cli_args() with ica.get_cli_parser().parse_args() across your project files

Under-the-Hood Improvements

  • Upgraded all dependencies to their latest versions
  • The CLI now throws an ImportError if a module spec cannot be created (this is unlikely, though)
  • The __main__ entry point module is now fully tested, increasing the code coverage for the library

v2.4.0

27 Dec 21:17
71fe886
Compare
Choose a tag to compare
  • Upgraded dependencies to latest versions
    • EDIT: the dependency upgrade actually never got merged; this will be fixed in the next release

v2.3.0

06 Mar 23:09
be7ecf3
Compare
Choose a tag to compare
  • Added a count for audio messages to the attachment_totals analyzer
  • The exposed attachments dataframe has been updated to include columns for:
    • The filename of the attachment, if applicable
    • The ID of the associated message
  • The messages dataframe has been updated to include a column for the ID of the message

v2.2.0

22 Feb 04:48
75445ca
Compare
Choose a tag to compare
  • Rewrote the most_frequent_emojis analyzer to be substantially faster and more accurate
    • The time complexity of the algorithm has been reduced from O(n^2) to O(n), resulting in significant speedups (e.g. 10s to 3s, or 4s to 2s)
    • The new algorithm also handles combined emojis correctly (e.g. 👨‍💻, which is a combination of 👨 and 💻, is now counted correctly)
  • Small refactoring improvements to clean up the codebase

v2.1.0

08 Feb 23:14
660728d
Compare
Choose a tag to compare
  • Fixed a bug where ICA could not infer the format from an *.md file extension when passing a Markdown file as an output path
  • A FormatNotSupportedError has been added, and is now raised if the specified format is unsupported (either on the CLI via -f/--format, or when calling ica.output_results with the format parameter)
  • Refactored ica.output_results tests to be much more robust

v2.0.0

06 Feb 20:38
a27ea6e
Compare
Choose a tag to compare

ICA v2 is the next major release of the library that represents as significant of a milestone as the initial v1 release!
https://pypi.org/project/imessage-conversation-analyzer/

TL;DR

  1. In addition to the CLI, a comprehensive Python API has been added so that you can write custom programs to integrate with the library more easily
  2. It adds support for many more emoji
  3. It fixes some major bugs and makes the tool more intuitive to use
  4. It adds support for writing to Excel files
  5. It adds support for non-US phone numbers
  6. It adds timezone support to eliminate any potential for date/time ambiguity

Python API

Most notably is the addition of a fully-typed Python API which allows you to write custom analyzers that integrate with ICA with greater power and flexibility.

v1 had a concept of "metric files", which were rather limited in capability because they could only be called via the CLI and did not allow for post-processing.

In v2, these "metric files" have been re-dubbed "analyzers" for better clarity, and the new Python API allows for importing of the ica package in your module.

This new API was designed to be adaptable to different kinds of needs. That is, the processing of the message data provided by the library can be as simple or as sophisticated as you'd like. For example, you can either choose to integrate with the built-in CLI, or you can write in your own processing logic.

We encourage you to look at the built-in analyzer modules as examples of how to use this new API.

Improved Emoji Support

Previously, ICA only supported a small subset of emoji for the "Most Frequent" analyzer. ICA v2 adds support for over 1,800 of the emoji supported by the Unicode standard. This should cover the majority of emojis that people use in their message conversations.

Parsing of Typedstream-Encoded Message Data

Certain messages in the macOS message database are encoded using Apple's binary typedstream format in a special attributedBody column. In ICA v1, these types of messages could not be parsed and therefore were excluded from the dataset and from certain analytics (like emoji counts).

In ICA v2, new logic has been added to decode these typedstream-encoded messages and merge them into the main dataset, thanks to help from the pytypedstream package. This means that you can have confidence that ICA will analyze the entirety of your message data for a conversation, not merely a subset of it.

Excel Support

The CLI and the Python API now support outputting your analyzer dataframe to Excel. This is achieved by specifying the new -o/--output flag on the CLI with a file path ending in .xlsx. You can also pass --format=xlsx if you want to capture or redirect the binary output for your own purposes.

For the Python API, you can pass the output parameter to ica.output_results() with an .xlsx file path. Alternatively, you can pass format='excel', with output as a BytesIO object.

ica transcript -c 'Thomas Riverstone' -o ./my_transcript.xlsx
ica.output_results(my_df, output='excel')

Timezone Support

Previously, all dates/times in ICA v1 would assume the local system timezone of the user running the CLI. In v2, this is still the default behavior, but a new -t/--timezone option (or timezone parameter for ica.get_dataframes) has been added. This new parameter accepts any IANA timezone name (e.g. America/New_York or UTC).

ica message_totals -c 'John Doe' -t UTC
dfs = ica.get_dataframes(contact_name=my_contact_name, timezone='UTC')

Default Format Changes

The default format (i.e. when you omit the --format/-f/format option) has changed slightly from using the tabulate package to using pandas.DataFrame.to_string. This improves the consistency of the API to allow for writing data in the default format to a buffer or file (like other formats).

Before:

Date                   Total
-------------------  -------
2024-01-26 00:00:00       12
2024-01-27 00:00:00       45
2024-01-28 00:00:00       56

After:

Date        Total
2024-01-26     12
2024-01-27     45
2024-01-28     56

Support for Non-US Phone Numbers

ICA v2 now integrates with the phonenumbers package to standardize the parsing of phone numbers when looking up the conversation for a particular contact. A benefit of this integration is that non-US phone numbers are supported.

Dependency Upgrades and Changes

All project dependencies have been updated to their latest versions:

Upgraded (Existing) Dependencies)

  • pandas has been upgraded to v2.2.0
  • tabulate has been upgraded to v0.9.0

New Dependencies

  • openpyxl (for reading and writing Excel files)
  • pyarrow (per the recommendation of pandas v2)
  • phonenumbers (to standardize the parsing of contact phone numbers)
  • tzlocal (for determining the local timezone of the user's system)

Full Test Suite

ICA v2 adds a full test suite, boasting 96% code coverage across the entire codebase. This includes tests for the core ica package and all built-in analyzers, for both the Python API and the CLI utility. With this, you may have greater confidence that the package will behave correctly in all the relevant cases.

CLI Changes

You may have noticed with the above examples that the Command Line API has also changed slightly. The -m parameter has been dropped in favor of specifying the analyzer name as a single positional parameter.

Before:

ica -c 'John Doe' -m ica/metrics/message_totals.py -f csv

After:

ica message_totals -c 'John Doe' -f csv

Bug Fixes

  1. Emojis with a count of zero are now excluded from the "Most Frequent Emojis" data
  2. Dates with no messages sent are now excluded from the "Totals by Day" analyzer
  3. Fixed "Days Missed" and "Days with No Reply" calculation for the "Message Totals" analyzer
  4. Fixed compatibility with systems running versions of sqlite3 older than v3.39.0

Beyond that, there are a wealth of other small improvements to refactor and polish up the codebase.

No changes since beta 1; the release notes are largely copied from the beta 1 release notes

v2.0.0-beta.1

06 Feb 00:25
5fad377
Compare
Choose a tag to compare
v2.0.0-beta.1 Pre-release
Pre-release

ICA v2 is the next major release of the library that represents as significant of a milestone as the initial v1 release!

TL;DR

  1. In addition to the CLI, a comprehensive Python API has been added so that you can write custom programs to integrate with the library more easily
  2. It adds support for many more emoji
  3. It fixes some major bugs and makes the tool more intuitive to use
  4. It adds support for writing to Excel files
  5. It adds support for non-US phone numbers
  6. It adds timezone support to eliminate any potential for date/time ambiguity

Python API

Most notably is the addition of a fully-typed Python API which allows you to write custom analyzers that integrates with ICA with greater power and flexibility.

v1 had a concept of "metric files", which were rather limited in capability because they could only be called via the CLI and did not allow for post-processing.

In v2, these "metric files" have been re-dubbed "analyzers" for better clarity, and the new Python API allows for importing of the ica package in your module.

This new API was designed to be adaptable to different kinds of needs. That is, the processing of the message data provided by the library can be as simple or as sophisticated as you'd like. For example, you can either choose to integrate with the built-in CLI, or you can write in your own processing logic.

We encourage you to look at the built-in analyzer modules as examples of how to use this new API.

Improved Emoji Support

Previously, ICA only supported a small subset of emoji for the "Most Frequent" analyzer. ICA v2 adds support for over 1,800 of the emoji supported by the Unicode standard. This should cover the majority of emojis that people use in their message conversations.

Parsing of Typedstream-Encoded Message Data

Certain messages in the macOS message database are encoded using Apple's binary typedstream format in a special attributedBody column. In ICA v1, these types of messages could not be parsed and therefore were excluded from the dataset and from certain analytics (like emoji counts).

In ICA v2, new logic has been added to decode these typedstream-encoded messages and merge them into the main dataset, thanks to help from the pytypedstream package. This means that you can place confidence that ICA will analyze the entirety of your message data for a conversation, not merely a subset of it.

Excel Support

The CLI and the Python API now support outputting your analyzer dataframe to Excel. This is achieved by specifying the new -o/--output flag on the CLI with a file path ending in .xlsx. You can also pass --format=xlsx if you want to capture or redirect the binary output for your own purposes.

For the Python API, you can pass the output parameter to ica.output_results() with an .xlsx file path. Alternatively, you can pass format='excel', with output as a BytesIO object.

ica transcript -c 'Thomas Riverstone' -o ./my_transcript.xlsx
ica.output_results(my_df, output='excel')

Timezone Support

Previously, all dates/times in ICA v1 would assume the local system timezone of the user running the CLI. In v2, this is still the default behavior, but a new -t/--timezone option (or timezone parameter for ica.get_dataframes) has been added. This new parameter accepts any IANA timezone name (e.g. America/New_York or UTC).

ica message_totals -c 'John Doe' -t UTC
dfs = ica.get_dataframes(contact_name=my_contact_name, timezone='UTC')

Default Format Changes

The default format (i.e. when you omit the --format/-f/format option) has changed slightly from using the tabulate package to using pandas.DataFrame.to_string. This improves the consistency of the API to allow for writing data in the default format to a buffer or file (like other formats).

Before:

Date                   Total
-------------------  -------
2024-01-26 00:00:00       12
2024-01-27 00:00:00       45
2024-01-28 00:00:00       56

After:

Date        Total
2024-01-26     12
2024-01-27     45
2024-01-28     56

Support for Non-US Phone Numbers

ICA v2 now integrates with the phonenumbers package to standardize the parsing of phone numbers when looking up the conversation for a particular contact. A benefit of this integration is that non-US phone numbers are supported.

Dependency Upgrades and Changes

All project dependencies have been updated to their latest versions:

Upgraded (Existing) Dependencies)

  • pandas has been upgraded to v2.2.0
  • tabulate has been upgraded to v0.9.0

New Dependencies

  • openpyxl (for reading and writing Excel files)
  • pyarrow (per the recommendation of pandas v2)
  • phonenumbers (to standardize the parsing of contact phone numbers)
  • tzlocal (for determining the local timezone of the user's system)

Full Test Suite

ICA v2 adds a full test suite, boasting 96% code coverage across the entire codebase. This includes tests for the core ica package and all built-in analyzers, for both the Python API and the CLI utility. With this, you may have greater confidence that the package will behave correctly in all the relevant cases.

CLI Changes

You may have noticed with the above examples that the Command Line API has also changed slightly. The -m parameter has been dropped in favor of specifying the analyzer name as a single positional parameter.

Before:

ica -c 'John Doe' -m ica/metrics/message_totals.py -f csv

After:

ica message_totals -c 'John Doe' -f csv

Bug Fixes

  1. Emojis with a count of zero are now excluded from the "Most Frequent Emojis" data
  2. Dates with no messages sent are now excluded from the "Totals by Day" analyzer
  3. Fixed "Days Missed" and "Days with No Reply" calculation for the "Message Totals" analyzer
  4. Fixed compatibility with systems running versions of sqlite3 older than v3.39.0

Beyond that, there are a wealth of other small improvements to refactor and polish up the codebase.

v1.2.3

20 Jan 19:21
b2c2958
Compare
Choose a tag to compare
  • Fixed the CLI program failing to run due to a number of missing file errors
    • Everyone is strongly encouraged to update to this version

v1.2.1

19 Jan 03:08
040904a
Compare
Choose a tag to compare
  • Fixed a critical bug affecting the v1.2.0 distributions where the emojis data was missing, thus causing the most_frequent_emojis and least_frequent_emojis to raise an exception.