Releases: caleb531/imessage-conversation-analyzer
v2.6.0
New Features
- You can now filter any analyzer by date and participant
- These are available in the CLI via new
--from-date
,--to-date
, and--from-person
flags - These are also available in the Python API via new input parameters to
ica.get_dataframes()
:from_date
,to_date
, andfrom_person
- See the README for details on how to use these new filters
- These are available in the CLI via new
- Added support for iOS 18's emoji-based reactions that allow for reacting with any arbitrary emoji
- This new support is mainly reflected in the Reactions metrics for the
message_totals
analyzer
- This new support is mainly reflected in the Reactions metrics for the
Fixes
- Fixed some incorrect logic for how YouTube, Spotify, and Apple Music links were counted within the
attachment_totals
analyzer
Housekeeping
- Added missing documentation for the
count_phrases
analyzer to the README - Other organizational tweaks and improvements to the README
v2.5.0
New Features
- Added a new (built-in)
count_phrases
analyzer which allow you to count the number of case-insensitive occurrences of any arbitrary strings across all messages in a conversation (excluding reactions)- e.g.
ica -c count_phrases -c 'Jane Fernbrook' 'i love you'
- e.g.
- Added a new
prettify_index
parameter to theica.output_results
function; if you specify it with a value ofFalse
, it will disable the default behavior of titleizing index values (see the newcount_phrases
analyzer for an example)
Deprecations
- The
get_cli_args()
function has been deprecated in favor of the newget_cli_parser()
method- The
get_cli_parser()
function gives you access to the underlyingargparse.ArgumentParser
instance, allowing you to add new CLI arguments specific to your analyzer - To migrate, replace
ica.get_cli_args()
withica.get_cli_parser().parse_args()
across your project files
- The
Under-the-Hood Improvements
- Upgraded all dependencies to their latest versions
- The CLI now throws an
ImportError
if a module spec cannot be created (this is unlikely, though) - The
__main__
entry point module is now fully tested, increasing the code coverage for the library
v2.4.0
v2.3.0
- Added a count for audio messages to the
attachment_totals
analyzer - The exposed
attachments
dataframe has been updated to include columns for:- The filename of the attachment, if applicable
- The ID of the associated message
- The
messages
dataframe has been updated to include a column for the ID of the message
v2.2.0
- Rewrote the most_frequent_emojis analyzer to be substantially faster and more accurate
- The time complexity of the algorithm has been reduced from O(n^2) to O(n), resulting in significant speedups (e.g. 10s to 3s, or 4s to 2s)
- The new algorithm also handles combined emojis correctly (e.g. 👨💻, which is a combination of 👨 and 💻, is now counted correctly)
- Small refactoring improvements to clean up the codebase
v2.1.0
- Fixed a bug where ICA could not infer the format from an
*.md
file extension when passing a Markdown file as an output path - A
FormatNotSupportedError
has been added, and is now raised if the specified format is unsupported (either on the CLI via-f
/--format
, or when callingica.output_results
with theformat
parameter) - Refactored
ica.output_results
tests to be much more robust
v2.0.0
ICA v2 is the next major release of the library that represents as significant of a milestone as the initial v1 release!
https://pypi.org/project/imessage-conversation-analyzer/
TL;DR
- In addition to the CLI, a comprehensive Python API has been added so that you can write custom programs to integrate with the library more easily
- It adds support for many more emoji
- It fixes some major bugs and makes the tool more intuitive to use
- It adds support for writing to Excel files
- It adds support for non-US phone numbers
- It adds timezone support to eliminate any potential for date/time ambiguity
Python API
Most notably is the addition of a fully-typed Python API which allows you to write custom analyzers that integrate with ICA with greater power and flexibility.
v1 had a concept of "metric files", which were rather limited in capability because they could only be called via the CLI and did not allow for post-processing.
In v2, these "metric files" have been re-dubbed "analyzers" for better clarity, and the new Python API allows for importing of the ica
package in your module.
This new API was designed to be adaptable to different kinds of needs. That is, the processing of the message data provided by the library can be as simple or as sophisticated as you'd like. For example, you can either choose to integrate with the built-in CLI, or you can write in your own processing logic.
We encourage you to look at the built-in analyzer modules as examples of how to use this new API.
Improved Emoji Support
Previously, ICA only supported a small subset of emoji for the "Most Frequent" analyzer. ICA v2 adds support for over 1,800 of the emoji supported by the Unicode standard. This should cover the majority of emojis that people use in their message conversations.
Parsing of Typedstream-Encoded Message Data
Certain messages in the macOS message database are encoded using Apple's binary typedstream format in a special attributedBody
column. In ICA v1, these types of messages could not be parsed and therefore were excluded from the dataset and from certain analytics (like emoji counts).
In ICA v2, new logic has been added to decode these typedstream-encoded messages and merge them into the main dataset, thanks to help from the pytypedstream package. This means that you can have confidence that ICA will analyze the entirety of your message data for a conversation, not merely a subset of it.
Excel Support
The CLI and the Python API now support outputting your analyzer dataframe to Excel. This is achieved by specifying the new -o
/--output
flag on the CLI with a file path ending in .xlsx
. You can also pass --format=xlsx
if you want to capture or redirect the binary output for your own purposes.
For the Python API, you can pass the output
parameter to ica.output_results()
with an .xlsx
file path. Alternatively, you can pass format='excel'
, with output
as a BytesIO
object.
ica transcript -c 'Thomas Riverstone' -o ./my_transcript.xlsx
ica.output_results(my_df, output='excel')
Timezone Support
Previously, all dates/times in ICA v1 would assume the local system timezone of the user running the CLI. In v2, this is still the default behavior, but a new -t
/--timezone
option (or timezone
parameter for ica.get_dataframes
) has been added. This new parameter accepts any IANA timezone name (e.g. America/New_York
or UTC
).
ica message_totals -c 'John Doe' -t UTC
dfs = ica.get_dataframes(contact_name=my_contact_name, timezone='UTC')
Default Format Changes
The default format (i.e. when you omit the --format
/-f
/format
option) has changed slightly from using the tabulate
package to using pandas.DataFrame.to_string. This improves the consistency of the API to allow for writing data in the default format to a buffer or file (like other formats).
Before:
Date Total
------------------- -------
2024-01-26 00:00:00 12
2024-01-27 00:00:00 45
2024-01-28 00:00:00 56
After:
Date Total
2024-01-26 12
2024-01-27 45
2024-01-28 56
Support for Non-US Phone Numbers
ICA v2 now integrates with the phonenumbers package to standardize the parsing of phone numbers when looking up the conversation for a particular contact. A benefit of this integration is that non-US phone numbers are supported.
Dependency Upgrades and Changes
All project dependencies have been updated to their latest versions:
Upgraded (Existing) Dependencies)
New Dependencies
- openpyxl (for reading and writing Excel files)
- pyarrow (per the recommendation of pandas v2)
- phonenumbers (to standardize the parsing of contact phone numbers)
- tzlocal (for determining the local timezone of the user's system)
Full Test Suite
ICA v2 adds a full test suite, boasting 96% code coverage across the entire codebase. This includes tests for the core ica
package and all built-in analyzers, for both the Python API and the CLI utility. With this, you may have greater confidence that the package will behave correctly in all the relevant cases.
CLI Changes
You may have noticed with the above examples that the Command Line API has also changed slightly. The -m
parameter has been dropped in favor of specifying the analyzer name as a single positional parameter.
Before:
ica -c 'John Doe' -m ica/metrics/message_totals.py -f csv
After:
ica message_totals -c 'John Doe' -f csv
Bug Fixes
- Emojis with a count of zero are now excluded from the "Most Frequent Emojis" data
- Dates with no messages sent are now excluded from the "Totals by Day" analyzer
- Fixed "Days Missed" and "Days with No Reply" calculation for the "Message Totals" analyzer
- Fixed compatibility with systems running versions of sqlite3 older than v3.39.0
Beyond that, there are a wealth of other small improvements to refactor and polish up the codebase.
No changes since beta 1; the release notes are largely copied from the beta 1 release notes
v2.0.0-beta.1
ICA v2 is the next major release of the library that represents as significant of a milestone as the initial v1 release!
TL;DR
- In addition to the CLI, a comprehensive Python API has been added so that you can write custom programs to integrate with the library more easily
- It adds support for many more emoji
- It fixes some major bugs and makes the tool more intuitive to use
- It adds support for writing to Excel files
- It adds support for non-US phone numbers
- It adds timezone support to eliminate any potential for date/time ambiguity
Python API
Most notably is the addition of a fully-typed Python API which allows you to write custom analyzers that integrates with ICA with greater power and flexibility.
v1 had a concept of "metric files", which were rather limited in capability because they could only be called via the CLI and did not allow for post-processing.
In v2, these "metric files" have been re-dubbed "analyzers" for better clarity, and the new Python API allows for importing of the ica
package in your module.
This new API was designed to be adaptable to different kinds of needs. That is, the processing of the message data provided by the library can be as simple or as sophisticated as you'd like. For example, you can either choose to integrate with the built-in CLI, or you can write in your own processing logic.
We encourage you to look at the built-in analyzer modules as examples of how to use this new API.
Improved Emoji Support
Previously, ICA only supported a small subset of emoji for the "Most Frequent" analyzer. ICA v2 adds support for over 1,800 of the emoji supported by the Unicode standard. This should cover the majority of emojis that people use in their message conversations.
Parsing of Typedstream-Encoded Message Data
Certain messages in the macOS message database are encoded using Apple's binary typedstream format in a special attributedBody
column. In ICA v1, these types of messages could not be parsed and therefore were excluded from the dataset and from certain analytics (like emoji counts).
In ICA v2, new logic has been added to decode these typedstream-encoded messages and merge them into the main dataset, thanks to help from the pytypedstream package. This means that you can place confidence that ICA will analyze the entirety of your message data for a conversation, not merely a subset of it.
Excel Support
The CLI and the Python API now support outputting your analyzer dataframe to Excel. This is achieved by specifying the new -o
/--output
flag on the CLI with a file path ending in .xlsx
. You can also pass --format=xlsx
if you want to capture or redirect the binary output for your own purposes.
For the Python API, you can pass the output
parameter to ica.output_results()
with an .xlsx
file path. Alternatively, you can pass format='excel'
, with output
as a BytesIO
object.
ica transcript -c 'Thomas Riverstone' -o ./my_transcript.xlsx
ica.output_results(my_df, output='excel')
Timezone Support
Previously, all dates/times in ICA v1 would assume the local system timezone of the user running the CLI. In v2, this is still the default behavior, but a new -t
/--timezone
option (or timezone
parameter for ica.get_dataframes
) has been added. This new parameter accepts any IANA timezone name (e.g. America/New_York
or UTC
).
ica message_totals -c 'John Doe' -t UTC
dfs = ica.get_dataframes(contact_name=my_contact_name, timezone='UTC')
Default Format Changes
The default format (i.e. when you omit the --format
/-f
/format
option) has changed slightly from using the tabulate
package to using pandas.DataFrame.to_string. This improves the consistency of the API to allow for writing data in the default format to a buffer or file (like other formats).
Before:
Date Total
------------------- -------
2024-01-26 00:00:00 12
2024-01-27 00:00:00 45
2024-01-28 00:00:00 56
After:
Date Total
2024-01-26 12
2024-01-27 45
2024-01-28 56
Support for Non-US Phone Numbers
ICA v2 now integrates with the phonenumbers package to standardize the parsing of phone numbers when looking up the conversation for a particular contact. A benefit of this integration is that non-US phone numbers are supported.
Dependency Upgrades and Changes
All project dependencies have been updated to their latest versions:
Upgraded (Existing) Dependencies)
New Dependencies
- openpyxl (for reading and writing Excel files)
- pyarrow (per the recommendation of pandas v2)
- phonenumbers (to standardize the parsing of contact phone numbers)
- tzlocal (for determining the local timezone of the user's system)
Full Test Suite
ICA v2 adds a full test suite, boasting 96% code coverage across the entire codebase. This includes tests for the core ica
package and all built-in analyzers, for both the Python API and the CLI utility. With this, you may have greater confidence that the package will behave correctly in all the relevant cases.
CLI Changes
You may have noticed with the above examples that the Command Line API has also changed slightly. The -m
parameter has been dropped in favor of specifying the analyzer name as a single positional parameter.
Before:
ica -c 'John Doe' -m ica/metrics/message_totals.py -f csv
After:
ica message_totals -c 'John Doe' -f csv
Bug Fixes
- Emojis with a count of zero are now excluded from the "Most Frequent Emojis" data
- Dates with no messages sent are now excluded from the "Totals by Day" analyzer
- Fixed "Days Missed" and "Days with No Reply" calculation for the "Message Totals" analyzer
- Fixed compatibility with systems running versions of sqlite3 older than v3.39.0
Beyond that, there are a wealth of other small improvements to refactor and polish up the codebase.