cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Tam
by New Contributor III
  • 1399 Views
  • 2 replies
  • 0 kudos

TABLE_REDIRECTION_ERROR in AWS Athena After Databricks Upgrade to 14.3 LTS

I have a Databricks pipeline set up to create Delta tables on AWS S3, using Glue Catalog as the Metastore. I was able to query the Delta table via Athena successfully. However, after upgrading Databricks Cluster from 13.3 LTS to 14.3 LTS, I began enc...

Tam_1-1707445843989.png
  • 1399 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Tam,    It appears that you’ve encountered a TABLE_REDIRECTION_ERROR while working with your Databricks pipeline, AWS S3, Glue Catalog, and Athena. Let’s break down the issue and explore potential solutions: AWS Glue as a Catalog for Databric...

  • 0 kudos
1 More Replies
Coders
by New Contributor II
  • 1794 Views
  • 2 replies
  • 0 kudos

How to do perform deep clone for data migration from one Datalake to another?

 I'm attempting to migrate data from Azure Data Lake to S3 using deep clone. The data in the source Data Lake is stored in Parquet format and partitioned. I've tried to follow the documentation from Databricks, which suggests that I need to register ...

  • 1794 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Coders, It appears that you’re encountering an issue while attempting to migrate data from Azure data lake to S3 using deep clone. Let’s break down the problem and explore potential solutions. Error Explanation: The error message you receive...

  • 0 kudos
1 More Replies
data-warriors
by New Contributor
  • 821 Views
  • 1 replies
  • 0 kudos

workspace deletion at Databricks recovery

Hi Team,I accidentally deleted our databricks workspace, which had all our artefacts and control plane, and was the primary resource for our team's working environment.Could anyone please help on priority, regarding the recovery/ restoration mechanis...

  • 821 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @data-warriors, I understand the urgency of your situation. Unfortunately, once a Databricks subscription is cancelled, all associated workspaces are permanently deleted and cannot be recovered1

  • 0 kudos
Poonam17
by New Contributor II
  • 937 Views
  • 1 replies
  • 2 kudos

Not able to deploy cluster in databricks community edition

 Hello team, I am not able to launch databricks cluster in community edition. automatically its getting terminated. Can someone please help here ? Regards.,poonam

IMG_6296.jpeg
  • 937 Views
  • 1 replies
  • 2 kudos
Latest Reply
kakalouk
New Contributor II
  • 2 kudos

I face the exact same problem. The message i get is this:"Bootstrap Timeout:Node daemon ping timeout in 780000 ms for instance i-062042a9d4be8725e @ 10.172.197.194. Please check network connectivity between the data plane and the control plane."

  • 2 kudos
TheDataEngineer
by New Contributor
  • 2319 Views
  • 1 replies
  • 0 kudos

'replaceWhere' clause in spark.write for a partitioned table

Hi, I want to be clear about 'replaceWhere' clause in spark.write.Here is the scenario:I would like to add a column to few existing records.The table is already partitioned on "PickupMonth" column.Here is example: Without 'replaceWhere'spark.read \.f...

  • 2319 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @TheDataEngineer, Let’s dive into the details of the replaceWhere clause in Spark’s Delta Lake. The replaceWhere option is a powerful feature in Delta Lake that allows you to overwrite a subset of a table during write operations. Specifically, ...

  • 0 kudos
chrisf_sts
by New Contributor II
  • 1329 Views
  • 1 replies
  • 0 kudos

Can I generate a uuid4 column when I do a COPY INTO command?

I have raw call log data and the logs don't have a unique id number so I generate a uuid4 number when i load them using spark.  Now I want to save the records to a table, and run a COPY INTO command every day to ingest new records.  I am only appendi...

  • 1329 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @chrisf_sts, You can achieve this by generating UUIDs during the COPY INTO command. Here are a few approaches based on the database system you’re using: PostgreSQL: If you’re working with PostgreSQL, you can specify the columns explicitly in ...

  • 0 kudos
vvk
by New Contributor II
  • 1565 Views
  • 2 replies
  • 0 kudos

Unable to upload a wheel file in Azure DevOps pipeline

Hi, I am trying to upload a wheel file to Databricks workspace using Azure DevOps release pipeline to use it in the interactive cluster. I tried "databricks workspace import" command, but looks like it does not support .whl files. Hence, I tried to u...

  • 1565 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @vvk, Uploading Python wheel files to an Azure Databricks workspace via an Azure DevOps release pipeline involves a few steps. Let’s troubleshoot the issue you’re facing: Authorization Error: The “Authorization failed” error you’re encounteri...

  • 0 kudos
1 More Replies
caldempsey
by New Contributor
  • 1177 Views
  • 1 replies
  • 0 kudos

Delta Lake Spark fails to write _delta_log via a Notebook without granting the Notebook data access

I have set up a Jupyter Notebook w/ PySpark connected to a Spark cluster, where the Spark instance is intended to perform writes to a Delta table.I'm observing that the Spark instance fails to complete the writes if the Jupyter Notebook doesn't have ...

Data Engineering
deltalake
Docker
spark
  • 1177 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @caldempsey, Thank you for providing detailed information about your setup and the issue you’re encountering with Spark writes to a Delta table. Let’s dive into this behavior and explore potential solutions. Access to Data Location: You’ve co...

  • 0 kudos
Andrewcon
by New Contributor
  • 1631 Views
  • 1 replies
  • 0 kudos

Delta tables and YOLO computer vision tasks

 Hi all,I would really appreciate if someone could help me out. I feel it’s both a data engineering and ML question.One thing we use at wo is YOLO for object detection. I’ve managed to run YOLO by loading data from the blob storage, but I’ve seen tha...

Data Engineering
computer vision
Delta table
YOLO
  • 1631 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Andrewcon, Training computer vision models on Delta Live Tables in Databricks is an interesting challenge. Let’s break it down: Delta Live Tables: Delta Live Tables is a declarative framework for building reliable, maintainable, and testable ...

  • 0 kudos
Jaynab_1
by New Contributor
  • 755 Views
  • 1 replies
  • 0 kudos

Trying calculate Zonal_stats using mosaic and H3

I am trying to calculate Zonal_stats for raster data using mosaic and H3. Created dataframe from geometry data to H3 index. While previously I was calculating Zonal_stats using rasterio, tif file, geometry data in python which is slow. Now want to ex...

  • 755 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Jaynab_1, Let’s explore how you can calculate zonal statistics using Mosaic and H3. While Mosaic itself doesn’t directly provide a built-in function for zonal statistics, we can leverage other tools and libraries to achieve this. Zonal Statis...

  • 0 kudos
_databreaks
by New Contributor II
  • 658 Views
  • 1 replies
  • 0 kudos

Autolodaer schemaHints convert valid values to null

I am ingesting json files from S3 using Autoloader and would like to use schemaHints to define the datatype of one of the fields, that is, I wanted the field id to be of integer type.The DLT code below infers the the id as string, with correct values...

  • 658 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @_databreaks, When using Auto Loader to ingest JSON files from S3, it’s essential to configure schema inference and evolution correctly. Let’s dive into the details: Schema Inference and Evolution: Auto Loader can automatically detect the sch...

  • 0 kudos
yatharth
by New Contributor III
  • 768 Views
  • 1 replies
  • 0 kudos

LZO codec not working for graviton instances

Hi databricks:I have a job where I am saving my data in json format lzo compressed which requires the library lzo-codecon shifting to graviton instances I noticed that the same job started throwing exceptionCaused by: java.lang.RuntimeException: nati...

  • 768 Views
  • 1 replies
  • 0 kudos
Latest Reply
yatharth
New Contributor III
  • 0 kudos

For more context, Please use the following code to replicate the error:# Create a Python list containing JSON objectsjson_data = [    {        "id": 1,        "name": "John",        "age": 25    },    {        "id": 2,        "name": "Jane",        "...

  • 0 kudos
noimeta
by Contributor II
  • 4201 Views
  • 7 replies
  • 4 kudos

Resolved! Databricks SQL: catalog of each query

Currently, we are migrating from hive metastore to UC. We have several dashboards and a huge number of queries whose catalogs have been set to hive_metastore and using <db>.<table> access pattern.I'm just wondering if there's a way to switch catalogs...

  • 4201 Views
  • 7 replies
  • 4 kudos
Latest Reply
abdulrahim
New Contributor II
  • 4 kudos

Absolutely accurate, in order to grow your business you need to create an image of your brand such that it is the first thing coming to customers mind when they think about a certain product or service that’s where social media marketing agencies com...

  • 4 kudos
6 More Replies
Serhii
by Contributor
  • 6824 Views
  • 7 replies
  • 4 kudos

Resolved! Saving complete notebooks to GitHub from Databricks repos.

When saving notebook to GiHub repo, it is stripped to Python source code. Is it possible to save it in the ipynb formt?

  • 6824 Views
  • 7 replies
  • 4 kudos
Latest Reply
GlennStrycker
New Contributor III
  • 4 kudos

When I save+commit+push my .ipynb file to my linked git repo, I noticed that only the cell inputs are saved, not the output.  This differs from the .ipynb file I get when I choose "File / Export / iPython Notebook".  Is there a way to save the cell o...

  • 4 kudos
6 More Replies
GlennStrycker
by New Contributor III
  • 1768 Views
  • 1 replies
  • 0 kudos

Resolved! Saving ipynb notebooks to git does not include output cells -- differs from export

When I save+commit+push my .ipynb file to my linked git repo, I noticed that only the cell inputs are saved, not the output.  This differs from the .ipynb file I get when I choose "File / Export / iPython Notebook".  Is there a way to save the cell o...

  • 1768 Views
  • 1 replies
  • 0 kudos
Latest Reply
GlennStrycker
New Contributor III
  • 0 kudos

I may have figured this out.  You need to allow output in the settings, which will add a .databricks file to your repo, then you'll need to edit the options on your notebook and/or edit the .databricks file to allow all outputs.

  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels