cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

164079
by Contributor II
  • 2949 Views
  • 4 replies
  • 0 kudos

Resolved! authentication is not configured for provider

Hi team, I started getting this message lately when trying add some new config or change my workspace with terraform :Error: cannot create global init script: authentication is not configured for provider. Please check https://registry.terraform.io/p...

image
  • 2949 Views
  • 4 replies
  • 0 kudos
Latest Reply
Vivian_Wilfred
Honored Contributor
  • 0 kudos

Hi @Avi Edri​ looks like you are using a provider that is authenticated to the Accounts console (https://accounts.cloud.databricks.com) to create a global init script within the workspace. Can you try authentication with host and PAT token? Follow th...

  • 0 kudos
3 More Replies
Tripalink
by New Contributor III
  • 6271 Views
  • 6 replies
  • 2 kudos

Resolved! Failed to fetch archive.ubuntu

I am trying to use selenium webdriver to do a scraping project in Databricks. The notebook used to run properly but now has an issue with the Get:1 http://archive.ubuntu.com/ubuntu focal/main amd64 fonts-liberation all 1:1.07.4-11 [822 kB]command .In...

  • 6271 Views
  • 6 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

Hi, @Dagart Allison​ . I've created a new version of the selenium with the databricks manual. Please look here https://community.databricks.com/s/feed/0D58Y00009SWgVuSAL

  • 2 kudos
5 More Replies
Tripalink
by New Contributor III
  • 3599 Views
  • 2 replies
  • 0 kudos

Using Selenium Chrome Driver in Databricks, runs the first time but fails after that

I have a notebook that uses a Selenium Web Driver for Chrome and it works the first time I run the notebook. If I run the notebook again, it will not work and gives the error message: WebDriverException: Message: unknown error: unable to discover op...

  • 3599 Views
  • 2 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 0 kudos

Hi, @Dagart Allison​ . I've created a new version of the selenium with the databricks manual. Please look here https://community.databricks.com/s/feed/0D58Y00009SWgVuSAL

  • 0 kudos
1 More Replies
Arun_tsr
by New Contributor III
  • 5529 Views
  • 6 replies
  • 2 kudos

How to do bucketing in Databricks?

We are migrating a job from onprem to databricks. We are trying to optimize the jobs but couldn't use bucketing because by default databricks stores all tables as delta table and it shows error that bucketing is not supported for delta. Is there anyw...

  • 5529 Views
  • 6 replies
  • 2 kudos
Latest Reply
Pat
Honored Contributor III
  • 2 kudos

Hi @Arun Balaji​ ,bucketing is not supported for the delta tables as you have noticed.For the optimization and best practices with delta tables check this:https://docs.databricks.com/optimizations/index.htmlhttps://docs.databricks.com/delta/best-prac...

  • 2 kudos
5 More Replies
164079
by Contributor II
  • 8910 Views
  • 13 replies
  • 23 kudos

Resolved! Users are failing query data from S3 bucket

Hi team, Users are unable run select on data located on S3 buckets, S3 permission are ok.The only way they manage do it by granted the databricks workspace admin permission.Attached the error.Thanks!

  • 8910 Views
  • 13 replies
  • 23 kudos
Latest Reply
karthik_p
Esteemed Contributor
  • 23 kudos

@Avi Edri​ adding some more info to @Pat Sienkiewicz​ suggestion, @Avi Edri​ are you using cluster with instance profile, if you are using instance profile configured, please validate read permissions are there on that bucket and instance profile ass...

  • 23 kudos
12 More Replies
wats0ns
by New Contributor III
  • 15932 Views
  • 7 replies
  • 10 kudos

Resolved! Migrate tables from one azure databricks workspace to another

Hello all,I'm currently trying to move the tables contained in one azure workspace to another, because of a change in the way we use our resources groups. I have not been able to move more than metadata with the databrickslabs/migrate repo. I was won...

  • 15932 Views
  • 7 replies
  • 10 kudos
Latest Reply
Pat
Honored Contributor III
  • 10 kudos

Hi @Quentin Maire​ ,We need a bit more details.where is your data stored ?are you using external or managed tables?the migrate tool allows you to export DDL statements not the data itself.I can think about few scenarios on Top of my head.if you had p...

  • 10 kudos
6 More Replies
Arun_tsr
by New Contributor III
  • 1717 Views
  • 2 replies
  • 0 kudos

Spark SQL output multiple small files

We are having multiple joins involving a large table (about 500gb in size). The output of the joins is stored into multiple small files each of size 800kb-1.5mb. Because of this the job is split into multiple tasks and taking a long time to complete....

Spark UI metrics
  • 1717 Views
  • 2 replies
  • 0 kudos
Latest Reply
Debayan
Esteemed Contributor III
  • 0 kudos

Hi @Arun Balaji​ , Could you please provide the error message you are receiving?

  • 0 kudos
1 More Replies
Kavin
by New Contributor II
  • 1637 Views
  • 1 replies
  • 2 kudos

Issue converting the datasets into JSON

Im a newbie to Databricks, I need to convert the data sets into JSON. i tried bth FOR JSON AUTO AND FOR JSON PATH, However im getting an issue - [PARSE_SYNTAX_ERROR] Syntax error at or near 'json'line My Query works fine without FOR JSON AUTO AND FOR...

  • 1637 Views
  • 1 replies
  • 2 kudos
Latest Reply
Debayan
Esteemed Contributor III
  • 2 kudos

Hi @Kavin Natarajan​ , Could you please go through https://www.tutorialkart.com/apache-spark/spark-write-dataset-to-json-file-example/ , looks like the steps are okay.

  • 2 kudos
ae20cg
by New Contributor III
  • 4961 Views
  • 3 replies
  • 3 kudos

Databricks web terminal not able to parse notebooks.

In the web terminal, I am not able to search for text for example using grep -l "search_term" db_notebookThis leads to "Operation not permitted on <notebook> error", any ideas why?This is for all DB notebooks in my cluster.Thanks!

  • 4961 Views
  • 3 replies
  • 3 kudos
Latest Reply
Debayan
Esteemed Contributor III
  • 3 kudos

Hi @Andrej Erkelens​ , Could you please start the shell with %sh . Also, could you please provide the whole screenshot with the error here along with the whole command tried?

  • 3 kudos
2 More Replies
StanleyTang
by New Contributor III
  • 2143 Views
  • 3 replies
  • 4 kudos

How to run SQL queries from services when data migrated from SQL server to data lake?

Currently our service provides an API to serve the purchase records. The purchase records are stored in SQL database. To simplify, when users want to get their recent purchase records, they make an API call. The API call will run a SQL query on the D...

  • 2143 Views
  • 3 replies
  • 4 kudos
Latest Reply
Debayan
Esteemed Contributor III
  • 4 kudos

Hi @Stanley Tang​ , There are several rest API resources managed by Databricks. You can refer https://docs.databricks.com/dev-tools/api/latest/index.html. In this scenario, SQL Warehouses API can be used: https://docs.databricks.com/sql/api/sql-endpo...

  • 4 kudos
2 More Replies
gideont
by New Contributor III
  • 3208 Views
  • 2 replies
  • 2 kudos

Resolved! spark sql update really slow

I tried to use Spark as much as possible but experience some regression. Hopefully to get some direction how to use it correctly.I've created a Databricks table using spark.sqlspark.sql('select * from example_view ') \ .write \ .mode('overwr...

image.png
  • 3208 Views
  • 2 replies
  • 2 kudos
Latest Reply
Pat
Honored Contributor III
  • 2 kudos

Hi, @Vincent Doe​ ,Updates are available in Delta tables, but under the hood you are updating parquet files, it means that each update needs to find the file where records are stored, then re-write the file to new version, and make new file current v...

  • 2 kudos
1 More Replies
ferbystudy
by New Contributor III
  • 2989 Views
  • 3 replies
  • 3 kudos

Resolved! Can´t read a simple .CSV from a blob

Guys, I am using "Databricks Community" to study. I put some files in a Blob, granted all access but I have no ideia why DB is not reading. Please see the code below and thanks for helping! thanks!

csf
  • 2989 Views
  • 3 replies
  • 3 kudos
Latest Reply
ferbystudy
New Contributor III
  • 3 kudos

Guys, i found the problem! ****, databricks! HhahahaFirst i went to datalake and set all access to public/grant all user owner access..I already mounted before.. So after this changes you will need toUnmount and then Mount again! Yeah, after that it ...

  • 3 kudos
2 More Replies
rams
by Contributor
  • 2541 Views
  • 3 replies
  • 4 kudos

Rollback error - Configuring Databricks lakehouse platform with AWS account

I have logged in databricks account and while creating the workspace i have chosen quickstart approach to configure the databricks with AWS. During the quickstart process the databricks page will redirect to aws cloudformation stack page where the ac...

  • 2541 Views
  • 3 replies
  • 4 kudos
Latest Reply
Debayan
Esteemed Contributor III
  • 4 kudos

Hi @rams shonu​ , The error looks like it is due to the length of the roleName. AWS IAM Role names are limited to 64 characters. Could you please try to edit the default roleName and try to append?

  • 4 kudos
2 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels