Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: fetch and parse full gmail message #5160

Merged
merged 11 commits into from
May 20, 2024
17 changes: 17 additions & 0 deletions .vscode/launch.json
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,23 @@
"internalConsoleOptions": "openOnSessionStart",
"console": "internalConsole",
"cwd": "${workspaceFolder}/packages/twenty-server/"
},
{
"name": "twenty-server - gmail fetch debug",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems a bit overkill, let's remove it :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I didnt mean to commit, but I needed a way to debug the email fetching, I was hoping that the worker would do that, but it didnt, any tips how to do that? :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the worker should do it actually, if you start the worker with the debugger and place a breakpoint it should respect it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! It finally works! :) The missing peace for me was actually uncommenting the MESSAGE_QUEUE_TYPE env variable so its not doing the sync type of queue which always landed in one process and the worker didnt pick it up :D Im still learning a lot about this project and there are always some parts which you guys take for granted and im getting surprised :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plus i got finally a person created out of the conversation as well 🎉 had to figure out all these conditions that were made with a good intention 😄 (sending a gmail to gmail message didnt do what I wanted it to do)

"type": "node",
"request": "launch",
"runtimeExecutable": "yarn",
"runtimeVersion": "18",
"runtimeArgs": [
"nx",
"run",
"twenty-server:command",
"cron:messaging:gmail-fetch-messages-from-cache",
],
"outputCapture": "std",
"internalConsoleOptions": "openOnSessionStart",
"console": "internalConsole",
"cwd": "${workspaceFolder}/packages/twenty-server/"
}
]
}
4 changes: 2 additions & 2 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -58,10 +58,10 @@
"@types/lodash.camelcase": "^4.3.7",
"@types/lodash.merge": "^4.6.7",
"@types/lodash.pick": "^4.3.7",
"@types/mailparser": "^3.4.4",
"@types/nodemailer": "^6.4.14",
"@types/passport-microsoft": "^1.0.3",
"add": "^2.0.6",
"addressparser": "^1.0.1",
"afterframe": "^1.0.2",
"apollo-server-express": "^3.12.0",
"apollo-upload-client": "^17.0.0",
Expand Down Expand Up @@ -123,7 +123,6 @@
"lodash.snakecase": "^4.1.1",
"lodash.upperfirst": "^4.3.1",
"luxon": "^3.3.0",
"mailparser": "^3.6.5",
"microdiff": "^1.3.2",
"nest-commander": "^3.12.0",
"next": "14.0.4",
Expand Down Expand Up @@ -228,6 +227,7 @@
"@swc/helpers": "~0.5.2",
"@testing-library/jest-dom": "^6.1.5",
"@testing-library/react": "14.0.0",
"@types/addressparser": "^1.0.3",
"@types/apollo-upload-client": "^17.0.2",
"@types/bcrypt": "^5.0.0",
"@types/better-sqlite3": "^7.6.8",
Expand Down
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
import { Injectable, Logger } from '@nestjs/common';

import { AxiosResponse } from 'axios';
import { simpleParser } from 'mailparser';
import planer from 'planer';
import addressparser from 'addressparser';

import { GmailMessage } from 'src/modules/messaging/types/gmail-message';
import { MessageQuery } from 'src/modules/messaging/types/message-or-thread-query';
import { GmailMessageParsedResponse } from 'src/modules/messaging/types/gmail-message-parsed-response';
import { FetchByBatchesService } from 'src/modules/messaging/services/fetch-by-batch/fetch-by-batch.service';
import { formatAddressObjectAsParticipants } from 'src/modules/messaging/services/utils/format-address-object-as-participants.util';
import { assert } from 'src/utils/assert';

@Injectable()
export class FetchMessagesByBatchesService {
Expand Down Expand Up @@ -73,28 +74,25 @@ export class FetchMessagesByBatchesService {
return;
}

const { historyId, id, threadId, internalDate, raw } = message;

const body = atob(raw?.replace(/-/g, '+').replace(/_/g, '/'));

try {
const parsed = await simpleParser(body, {
skipHtmlToText: true,
skipImageLinks: true,
skipTextToHtml: true,
maxHtmlLengthToParse: 0,
});

const { subject, messageId, from, to, cc, bcc, text, attachments } =
parsed;
const {
historyId,
id,
threadId,
internalDate,
subject,
from,
to,
headerMessageId,
text,
attachments,
} = this.parseGmailMessage(message);

if (!from) throw new Error('From value is missing');

const participants = [
...formatAddressObjectAsParticipants(from, 'from'),
...formatAddressObjectAsParticipants(to, 'to'),
...formatAddressObjectAsParticipants(cc, 'cc'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rostaklein why are we removing these participants?

...formatAddressObjectAsParticipants(bcc, 'bcc'),
];

let textWithoutReplyQuotations = text;
Expand All @@ -115,12 +113,12 @@ export class FetchMessagesByBatchesService {
const messageFromGmail: GmailMessage = {
historyId,
externalId: id,
headerMessageId: messageId || '',
headerMessageId,
subject: subject || '',
messageThreadExternalId: threadId,
internalDate,
fromHandle: from.value[0].address || '',
fromDisplayName: from.value[0].name || '',
fromHandle: from[0].address || '',
fromDisplayName: from[0].name || '',
participants,
text: sanitizeString(textWithoutReplyQuotations || ''),
attachments,
Expand Down Expand Up @@ -157,4 +155,62 @@ export class FetchMessagesByBatchesService {

return { messages, errors };
}

private parseGmailMessage(message: GmailMessageParsedResponse) {
const subject = this.getPropertyFromHeaders(message, 'Subject');
const rawFrom = this.getPropertyFromHeaders(message, 'From');
const rawTo = this.getPropertyFromHeaders(message, 'To');
const messageId = this.getPropertyFromHeaders(message, 'Message-ID');
const id = message.id;
const threadId = message.threadId;
const historyId = message.historyId;
const internalDate = message.internalDate;

assert(id);
assert(threadId);
assert(historyId);
assert(internalDate);

const bodyData = this.getBodyData(message);
const text = bodyData ? Buffer.from(bodyData, 'base64').toString() : '';

return {
id,
headerMessageId: messageId,
threadId,
historyId,
internalDate,
subject,
from: addressparser(rawFrom),
to: addressparser(rawTo),
text,
attachments: [],
};
}

private getBodyData(message: GmailMessageParsedResponse) {
const firstPart = message.payload?.parts?.[0];

if (firstPart?.mimeType === 'text/plain') {
return firstPart?.body?.data;
}

return firstPart?.parts?.find((part) => part.mimeType === 'text/plain')
?.body?.data;
}

private getPropertyFromHeaders(
message: GmailMessageParsedResponse,
property: string,
) {
const value = message.payload?.headers?.find(
(header) => header.name === property,
)?.value;

if (value === undefined || value === null) {
throw new Error(`Cannot find property "${property}" in message headers`);
}

return value;
}
}
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
import { AddressObject } from 'mailparser';
import addressparser from 'addressparser';

import { Participant } from 'src/modules/messaging/types/gmail-message';

const formatAddressObjectAsArray = (
addressObject: AddressObject | AddressObject[],
): AddressObject[] => {
addressObject: addressparser.EmailAddress | addressparser.EmailAddress[],
): addressparser.EmailAddress[] => {
return Array.isArray(addressObject) ? addressObject : [addressObject];
};

Expand All @@ -13,24 +13,23 @@ const removeSpacesAndLowerCase = (email: string): string => {
};

export const formatAddressObjectAsParticipants = (
addressObject: AddressObject | AddressObject[] | undefined,
addressObject:
| addressparser.EmailAddress
| addressparser.EmailAddress[]
| undefined,
role: 'from' | 'to' | 'cc' | 'bcc',
): Participant[] => {
if (!addressObject) return [];
const addressObjects = formatAddressObjectAsArray(addressObject);

const participants = addressObjects.map((addressObject) => {
const emailAdresses = addressObject.value;
const address = addressObject.address;

return emailAdresses.map((emailAddress) => {
const { name, address } = emailAddress;

return {
role,
handle: address ? removeSpacesAndLowerCase(address) : '',
displayName: name || '',
};
});
return {
role,
handle: address ? removeSpacesAndLowerCase(address) : '',
displayName: addressObject.name || '',
};
});

return participants.flat();
Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,6 @@
export type GmailMessageParsedResponse = {
id: string;
threadId: string;
labelIds: string[];
snippet: string;
sizeEstimate: number;
raw: string;
historyId: string;
internalDate: string;
import { gmail_v1 } from 'googleapis';

export type GmailMessageParsedResponse = gmail_v1.Schema$Message & {
error?: {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of having errors mixed with the message. Also, it seems that the new type is having all fields as optionals. Can we infer a type from gmail_v1.Schema$Message with all fields we need required?

Do we need to have errors here or can we add it as an additional parameter of functions when needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, it was there before, I assume perhaps because of the way how fetch-by-batch.service.ts is implemented? 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aren't you adding the errors to this type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to have errors here or can we add it as an additional parameter of functions when needed?

I tried to have a look at this once again and even though im still not completely able to understand what fetch-by-batch.service.ts it apparently adds the error property to the parsed object when parsing the response here https://github.com/twentyhq/twenty/blob/main/packages/twenty-server/src/modules/messaging/services/fetch-by-batch/fetch-by-batch.service.ts#L107

We can of course make the type nicer by making it a literal like gmail_v1.Schema$Message | GmailError but just imagine looping through such an array. Currently it checks whether the error property exists and returns early if it does https://github.com/twentyhq/twenty/blob/main/packages/twenty-server/src/modules/messaging/services/fetch-messages-by-batches/fetch-messages-by-batches.service.ts#L70-L74

Can we infer a type from gmail_v1.Schema$Message with all fields we need required?

I mean, this is the best type we can get, or is it not? It comes straight from Google that clearly says some properties might are optional. When testing, those we need were always defined, but isnt it better to be safe than sorry? Im doing a runtime assertion using:

assert(id);
assert(threadId);
assert(historyId);
assert(internalDate);

This would throw early (message wouldnt be parsed) if some of them are actually missing. Safer approach actually rather than letting it throw further down the line with "cant read property X of undefined" or saving invalid data into the DB.

Let me know what you think about these two topics :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the assert pattern that you are using

I've just remove the errors from your type and it looks good

code: number;
message: string;
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
import { Attachment } from 'mailparser';

export type GmailMessage = {
historyId: string;
externalId: string;
Expand All @@ -25,3 +23,10 @@ export type ParticipantWithMessageId = Participant & { messageId: string };
export type ParticipantWithId = Participant & {
id: string;
};

export type Attachment = {
id: string;
filename: string;
size: number;
mimeType: string;
};
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,6 @@ export const createQueriesFromMessageIds = (
messageExternalIds: string[],
): MessageQuery[] => {
return messageExternalIds.map((messageId) => ({
uri: '/gmail/v1/users/me/messages/' + messageId + '?format=RAW',
uri: '/gmail/v1/users/me/messages/' + messageId + '?format=FULL',
}));
};