Get your data into BigQuery

Business Benefits

Query data and take advantage of BigQuery capabilities.


Decide whether to use batch loading or streaming for your data.

Batch loading: Load your data periodically, like once per day or once per week. Use cases: load the contacts from your website once per day for the reporting and analysis, load the sales from your CRM once per month to create monthly reports.

Streaming: Constantly send your data to the BigQuery project, every time the new data is available; when you need real-time monitoring. Use cases: fraud detection, mobile application usage when you want to detect errors and bugs immediately.

Choose a Google service, if applicable, to help to get your data into BigQuery.

  • BigQuery Data Transfer Service: Google Play, Google Ads (2.5$ per month per customer ID), YouTube; connects to some Google and non-Google (Amazon S3) tools.
  • Google Analytics 360/Google Analytics 4: Digital analytics platforms that connect naturally to BigQuery.
  • Cloud Storage: Send your CSV files and then retrieve them in BigQuery.
  • Google Cloud Dataflow: Build the data pipelines for streaming with Cloud Pub/Sub and batch loading.
  • Google Cloud Function: Allows you to react to an event with an automatic action: for example, you can trigger the function when a CSV file is added to your Cloud Storage, and send this data to your BigQuery table.
  • Google Cloud Composer: Helps you to author, schedule, and monitor pipelines.
  • Google Data Fusion: Data integration service for moving your data from some Google and non-Google services into BigQuery.

You can use Cloud API and client libraries to connect your data source to Cloud Storage and BigQuery.

Load your file manually into BigQuery by creating a table and uploading it from your computer.

  1. Click on your dataset where you want to create your table, then click on Create Table (+).
  2. Browse for the file on your computer and give a name to your table.
  3. Click on Auto detect Schema, then input parameters so that BigQuery will autodetect your field data types.
  4. Click on Create table.
  5. You will now see your table available in your dataset:

Connect Google Analytics 4 to your BigQuery by clicking on BigQuery Linking > Link at the property level.

  1. Choose your destination project and data location.
  2. Choose the frequency of the data upload; daily or streaming. you will need streaming only if you need to analyze the analytics data in real time, which is quite rare.
  3. Review and submit. You will see a new dataset created in your BigQuery project quite soon - it will be called analytics-XXXXXXX with your property number.

Get your analytics or advertising data into BigQuery using third-party connectors like Owox, Supermetrics, Funnel IO, and StitchData.

Third-party connectors give you the possibility to connect known and widely used data sources like ads platforms, CRMs, and call tracking solutions, to your BigQuery project. In this example we’ll use Owox connector as an example of connecting a Facebook advertising account to your BigQuery project.

  1. In your Owox account click on New Pipeline.
  2. Select Facebook Ads.
  3. Select BigQuery as the data destination.
  4. Provide access to your Facebook account and to BigQuery.
  5. Select BigQuery dataset that will receive the data or create this dataset.
  6. Choose the start date for your import.

You will soon see the Facebook data in your dataset in BigQuery.

Use DML/DDL to generate data as one of the methods to create tables in your BigQuery Data Warehouse.

You can query the existing data to create your new table. For example, query the public data:

SELECT
payment_type, count(unique_key) nb_trips
FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips`
group by payment_type
order by nb_trips DESC

And then add a line to create a new table payment_types in the test_dataset dataset in your fast-fire-262114 project:

CREATE TABLE `fast-fire-262114.test_dataset.payment_types` AS

If you have a custom data source that doesn’t connect with BigQuery easily, but this data source can generate the CSV files, you can use Cloud Storage and Cloud Function to get your data into BigQuery.

  1. Ask your developers to send you the CSVs files in batch mode, once per day, in your Cloud Storage bucket. They can use Google Cloud APIs and client libraries to connect to Google Cloud Storage.
  2. Create the Cloud Storage Bucket that will receive these files, and give your developers its name.

To create a Bucket:

  1. Go to Storage and click on Create Bucket and give it a unique name
  2. Choose where to store your data, you’ll need a Region location type and the location closest to you.
  3. Choose a Standard storage class and a fine-grained access control.
  4. Leave other settings unchanged.
  5. Click on Create.

Your bucket is ready to receive the information. Your developers can start sending the CSV files to your cloud storage bucket.

Configure a dataset and a table that will receive your data, so that BigQuery detects the schema automatically.

  1. Go to Cloud Functions.
  2. Click Create and give your function an explicit name, choose a Region that corresponds to your Storage region, choose Cloud Storage as Trigger type and Finalize/Create as Event type:
  3. Save and continue.
  4. Choose Python 3.7 and in the requirements.txt, add the following code:
google-cloud-bigquery
google-cloud-storage

Add code to import a CSV file into BigQuery to main.py.

Change the following elements to your own:

  • your-bucket to your own bucket name.
  • your-file-name.*.csv to your file name that will be loaded to Cloud Storage; You can use regular expressions if your file name changes, for example 2020-d{2}-d{2}-d+-contacts.*.csv
  • your-dataset and your_table to your dataset and table names.
"""Import a csv file into BigQuery."""

import logging
import os
import re

from google.cloud import bigquery
from google.cloud import storage

GCP_PROJECT = ‘your_project'


def bigqueryImport(event, context):
    """Import a csv file into BigQuery."""
    # get storage update data
    bucketname = ‘your-bucket‘
    filename = event['name']

    if not re.search(‘your-file-name.*.csv’, filename):
        logging.error('Unrecognized filename format: %s' % (filename))
        return

    # parse filename
    datasetname = ‘your-dataset'
    table_id = '%s.%s.%s' % (GCP_PROJECT, ‘your_dataset', ‘your_table')

    # log the receipt of the file
    uri = 'gs://%s/%s' % (bucketname, filename)

    # create bigquery client
    client = bigquery.Client()

    # get dataset reference
    dataset_ref = client.dataset(datasetname)

    # check if dataset exists, otherwise create
    try:
        client.get_dataset(dataset_ref)
    except Exception:
        logging.warn('Creating dataset: %s' % (datasetname))
        client.create_dataset(dataset_ref)

    # create a bigquery load job config
    job_config = bigquery.LoadJobConfig()
    job_config.autodetect = True
    job_config.create_disposition = 'CREATE_IF_NEEDED',
    job_config.write_disposition = 'WRITE_APPEND'

    #deleting file
    storage_client = storage.Client()
    #gcs = require('@google-cloud/storage')()
    bucket = storage_client.get_bucket(bucketname)
    file = bucket.blob(filename)
  	#const pr = file.delete()


    # create a bigquery load job
    try:
        load_job = client.load_table_from_uri(
            uri,
            table_id,
            job_config=job_config,
        )
        print('Load job: %s [%s]' % (
            load_job.job_id,
            table_id
        ))
        load_job.begin()
        load_job.result()
        file.delete()
    except Exception as e:
        logging.error('Failed to create load job: %s' % (e))

Click on Deploy to deploy your function.

Drag and drop your file to Cloud Storage and check if it was added to your BigQuery table.

Go to Cloud Function then to Logs to see any errors:

I cannot get the function to work. I get:

Deployment failure:

Build failed: *** Error compiling ‘./main.py’…
File “./main.py”, line 75
Displaying python-cloud-function.txt.
^
SyntaxError: invalid syntax; Error ID: 49c34848

What am I I doing wring?

Thanks in advance.

1 Like

Hi @shyross

Welcome to the community.

Can you please let me know where exactly in this process and which step you get this error in so i can better help?

I have created the bucket in Cloud Storage and the Data Set in BigQuery. The python file was coppied from the lesson and was updated with that info:

Blockquote

def bigqueryImport(event, context):
“”“Import a csv file into BigQuery.”""
# get storage update data
bucketname = ‘sales_digitalmarketing’
filename = event[‘name’]

if not re.search('sales.*\.csv', filename):
    logging.error('Unrecognized filename format: %s' % (filename))
    return

# parse filename
datasetname = 'your-dataset'
table_id = '%s.%s.%s' % (GCP_PROJECT, 'digital-marketing-341318.sales_datasheet', 'sales_2020')

Blockquote

Then when I created the function in Cloud Function and then when I click “Deploy” I get the error.

1 Like

1 Like

Thanks Stefan,

Will contact @khrystyna_grynko and see if I can get an answer on this question for you.

Hi @shyross,

From @khrystyna_grynko

You should rename ‘your-dataset’ into the name of your dataset.

Ok so I have this:

GCP_PROJECT = ‘digital-marketing-341318’

def bigqueryImport(event, context):
“”“Import a csv file into BigQuery.”""
# get storage update data
bucketname = ‘sales_digitalmarketing’
filename = event[‘name’]

if not re.search('sales.*\.csv', filename):
    logging.error('Unrecognized filename format: %s' % (filename))
    return

# parse filename
datasetname = 'sales_datasheet'
table_id = '%s.%s.%s' % (GCP_PROJECT, 'digital-marketing-341318.sales_datasheet', 'sales_2020')

Do I need to change the Runtime Service Account?

1 Like

I deployed and no data was added to the table.

1 Like

HI @shyross i noticed a couple of issues in the playbook, give the instructions another go and let me know if you still have the same issues. I contacted Khrystyna again in the meantime for you.

Great, thanks. It works now.

1 Like

Glad to hear that @shyross

Do you mind if you share with me a bit of information about what you where trying to achieve and how this playbook helped you?

I think this would be very helpful for others users to see how to use the playbooks correctly. I would really like to include this and give you a shout out in this weeks CXL newsletter.

Yes, no problem. There are a few data sources that do not have BigQuery connectors that I need to ingest data from. The main ones being keyword data from SEMRush and crawl data from Screaming Frog. The Python script in this course allows me to save the data as a csv file and use Cloud Storage and Cloud Function to get it into BigQuery. Before this I was creating a new table with the data in BigQuery and then inserting it into the proper table.

The next step for me would be to setup FUSE so I can mount Cloud Storage locally and save the csv files directly.

If you need anyone info from me please let me know.

1 Like

This is great and it will be helpful for others users to see your use case.

I just released all of Khrystynas playbooks over on the Expert playbooks list earlier today. Check them out here