Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Parse DMs, add user names and handles #6

Closed
PinguTS opened this issue Nov 10, 2022 · 15 comments · Fixed by #78
Closed

Feature request: Parse DMs, add user names and handles #6

PinguTS opened this issue Nov 10, 2022 · 15 comments · Fixed by #78

Comments

@PinguTS
Copy link

PinguTS commented Nov 10, 2022

The current twitter archive downloaded omits all the user names and handles. It only contains the ids of the accounts that someone interacted with. With that the archive looses context, especially for the DMs and reply's.

@timhutton
Copy link
Owner

timhutton commented Nov 10, 2022

I see lots of usernames and handles in both the twitter archive and the markdown produced by this script. Can you give an example of where data is missing?

The twitter archive does lose a lot of context - it only contains your tweets and replies, not entire threads.

This script doesn't currently attempt to parse the DMs.

@duracell
Copy link

Same here, would love to see handles/usernames in the dm section.

@timhutton
Copy link
Owner

timhutton commented Nov 11, 2022

I'm struggling to understand what's being asked here.

If this is a bug report: Please be precise about what the script did and what you were expecting it to do instead. It sounds like you are talking about missing data in Twitter's archive? If so then that's a bug for Twitter I would have thought?

If this is a feature request: Please give more details in what you would like the script to do. Currently it doesn't do anything with DMs.

@duracell
Copy link

I can only speak for myself, but from my POV this is a feature request.
The json for the direct messages has only an id, it would be great if the handle and name (and maybe even the picture) could be resolved and saved. Maybe this isn't the scope and a dedicated script would be better, idk.

@timhutton timhutton changed the title Enrico with user names and handles Feature request: Parse DMs, add user names and handles Nov 11, 2022
@timhutton
Copy link
Owner

OK. I have changed the title to reflect my understanding. I don't have any immediate plans to address this but maybe someone else would want to take a look. There are many other twitter archive parsers out there that may well already do this.

@duracell
Copy link

Great :)
Any recommendations which one can do this? Searched but couldn't find any :(

@n1ckfg
Copy link

n1ckfg commented Nov 13, 2022

I made a tool to turn those archive IDs into name, bio, and real url: https://gist.github.com/n1ckfg/df70c6fa1dabac4fe55cb551364adcc5

@flauschzelle
Copy link
Collaborator

I made a script to parse user IDs and map them to handles. It is different from the scripts linked above in that it doesn't need login or access to Twitter's API, because it uses the TweeterID web service to look up the handles. It also finds some of the handles in the archive itself (looking in mentions and retweets). Sometimes it also finds display names and links, but it can't look up the bio or profile picture yet.

Currently, it just writes the mappings into a JSON file, but you might already want to already use it anyway, in case Twitter goes down even faster than expected...

The script is available in the userids branch in my fork of this project:
https://github.com/flauschzelle/twitter-archive-parser/tree/userids

@lenaschimmel and me are working on integrating it into the main parser script and will probably be making a pull request to the main project here later. But integrating it properly might take a few days, so if you're in a hurry, feel free to use my version in the meantime :)

@timhutton
Copy link
Owner

@flauschzelle Thanks for looking into this. I was just looking at the JSON for this myself:

if 'in_reply_to_user_id' in tweet and 'in_reply_to_screen_name' in tweet:
  user_id_to_handle[tweet['in_reply_to_user_id']] = tweet['in_reply_to_screen_name']

For my archive this gives me 234 handles and is enough for making a start on parsing DMs, followers/followings.

Maybe we should get that basic functionality working and then add the lookup feature afterwards?

@lenaschimmel
Copy link
Collaborator

lenaschimmel commented Nov 19, 2022

I'm trying to understand what you are currently doing and if / how much it overlaps with what @flauschzelle and I have already done / are about to do...

So this is already done now by @flauschzelle:

  • Collect known and missing names from local archive data (tweets, mentions, dms, group dms, follower and following)
  • Load known and missing names from from previous runs (/data/parsed_users.json)
  • Check if anything is actually missing
  • make a list of user ids to look up
  • look them up (with tweeterid)
  • write results to the file

Currently working on:

  • Integrating your parser.py and @flauschzelle's user_id_parser.py (currently in my fork here though we are not sure if future work will happen primarily in my fork or @flauschzelle's fork)

Things I/we still plan to do:

  • unify coding style
  • unify style of log / user output
  • unify the approach for retries of failed requests
  • merge code into a single file so that the simple setup guide still works (Right-click this link parser.py and select "Save Link as"...)
  • add the approach by @n1ckfg as additional option, since it has both advantages and disadvantages

Things that seem useful, but that I didn't really look into:

@timhutton
Copy link
Owner

timhutton commented Nov 19, 2022

@lenaschimmel Yes, there was some overlap. The branch looks good. To avoid calamity let's tackle it in small PRs:

  1. existing convert_tweet() function appends whatever id:handle connections it finds to a data structure of Users.
  2. new functionality to parse DMs, using Users, output in some simple way
  3. new functionality to parse followers/followings, using Users, output in some simple way
  4. new functionality to do remote lookup on Users, to improve the output of the above

@press-rouch
Copy link
Collaborator

I've just made a note about getting full user data from the API (without a key!) on the followers issue:
#70 (comment)

@timhutton
Copy link
Owner

timhutton commented Nov 19, 2022

Updated roadmap with current progress:

  1. existing convert_tweet() function appends whatever id:handle connections it finds to a data structure of Users:
  2. new functionality to parse DMs, using Users, output in some simple way:
  3. new functionality to parse followers/followings, using Users, output in some simple way:
  4. new functionality to do remote lookup on Users, to improve the output of the above:
  5. improve DMs by adding images, expanding links, etc.:

@Bebetternow22
Copy link

Any way to add code to pull deleted DM's from your own personal account?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants