cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Dnirmania
by Contributor
  • 3780 Views
  • 4 replies
  • 0 kudos

Read file from AWS S3 using Azure Databricks

Hi TeamI am currently working on a project to read CSV files from an AWS S3 bucket using an Azure Databricks notebook. My ultimate goal is to set up an autoloader in Azure Databricks that reads new files from S3 and loads the data incrementally. Howe...

Dnirmania_0-1744106993274.png
  • 3780 Views
  • 4 replies
  • 0 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 0 kudos

no ,it is very easy follow this guide it will work - https://github.com/aviral-bhardwaj/MyPoCs/blob/main/SparkPOC/ETLProjectsAWS-S3toDatabricks.ipynb   

  • 0 kudos
3 More Replies
William_Scardua
by Valued Contributor
  • 12838 Views
  • 4 replies
  • 1 kudos

How to read data from Azure Log Analitycs ?

Hi guys,I need to read data from Azure Log Analitycs Workspace directaly, have any idea ?thank you

  • 12838 Views
  • 4 replies
  • 1 kudos
Latest Reply
alexott
Databricks Employee
  • 1 kudos

You can use Kusto Spark connector for that: https://github.com/Azure/azure-kusto-spark/blob/master/docs/KustoSource.md#source-read-command It heavily depends on how you access data, there could be a need for using ADX cluster for it: https://learn.mi...

  • 1 kudos
3 More Replies
KristiLogos
by Contributor
  • 3175 Views
  • 2 replies
  • 0 kudos

Resolved! GCS Error getting access token from metadata server at: http://169.254.169.254/computeMetadata/v1/in

I’m running Databricks on Azure and trying to read a CSV file from Google Cloud Storage (GCS) bucket using Spark. However, despite configuring Spark with a Google service account key, I’m encountering the following error:Error getting access token fr...

  • 3175 Views
  • 2 replies
  • 0 kudos
Latest Reply
ShivangiB
New Contributor III
  • 0 kudos

Hey, @KristiLogos , can you please suggest in what format key was stored in gsa_private_key.Actually we are using key vault based scope

  • 0 kudos
1 More Replies
thomas_berry
by Databricks Partner
  • 1572 Views
  • 1 replies
  • 0 kudos

Federated query on the source

Hello,I want to be able to run an arbitrary query on the source before its result gets sent to databricks. I want to create something like this: create table gold.bigqueryUSING org.apache.spark.sql.jdbcoptions( url "jdbc:postgresql://---:---/---",dri...

  • 1572 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 0 kudos

Hi thomas_berry,How are you doing today?, As per my understanding, You're spot on with your understanding, and you're not alone in running into this limitation. Unity Catalog doesn’t currently support creating tables using a JDBC query like in your e...

  • 0 kudos
AndrewBeck
by New Contributor
  • 2033 Views
  • 1 replies
  • 1 kudos

Python UDF support in Unity Catalog and runtime 13.3?

Hi community,I am running Databricks Unity Catalog. In the DataBricks UI, I see the Policy "shared-gp-(r6g)-small" and Runtime 13.3. (I have access to larger instances, just running a PoC on a small instance).Can anyone explain what looks like an inc...

  • 2033 Views
  • 1 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Great question — and yeah, what you’re seeing is a bit of a confusing experience that trips up a lot of folks working with Unity Catalog (UC). Let’s break it down: What’s Working for You   from pyspark.sql.types import LongType def squared_typed(s):...

  • 1 kudos
htd350
by New Contributor II
  • 1618 Views
  • 1 replies
  • 1 kudos

Predictive Optimization & Serverless Compute

Hello,I have a hard time understanding how predictive optimization if serverless compute is not enabled. According to the documentation:Predictive optimization identifies tables that would benefit from ANALYZE, OPTIMIZE, and VACUUM operations and que...

  • 1618 Views
  • 1 replies
  • 1 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 1 kudos

Hi @htd350, Predictive optimization in Databricks largely depends on the use of serverless compute to execute operations like ANALYZE, OPTIMIZE, and VACUUM, but not 100% sure if serverless is needed on all scenarios. I'll check internally and confirm...

  • 1 kudos
mrstevegross
by Contributor III
  • 1377 Views
  • 3 replies
  • 0 kudos

Graviton & containers?

Currently, DBR does not permit a user to run a containerized job on a graviton machines (per these docs). In our case, we're running containerized jobs on a pool. We are exploring adopting Graviton, but--per those docs--DBR won't let us do that.Are t...

  • 1377 Views
  • 3 replies
  • 0 kudos
Latest Reply
Isi
Honored Contributor III
  • 0 kudos

Hey @mrstevegross Steve,I have found this docs from Databricks about enviroments, as you can see is in public preview... If you find my previous answer helpful, feel free to mark it as the solution so it can help others as well.Thanks!Isi

  • 0 kudos
2 More Replies
suppome
by New Contributor
  • 698 Views
  • 1 replies
  • 0 kudos

CAN RESTART can read logs from Job and Spark

Is it possible to read logs from Job or workflow run when I am having CAN RESTART role?   

suppome_0-1744300593577.png
  • 698 Views
  • 1 replies
  • 0 kudos
Latest Reply
Isi
Honored Contributor III
  • 0 kudos

Hey @suppome I share with you the security model for clusters and JobsHope this helps Isi

  • 0 kudos
felix_counter
by New Contributor III
  • 20679 Views
  • 7 replies
  • 3 kudos

How to authenticate databricks provider in terraform using a system-managed identity?

Hello,I want to authenticate the databricks provider using a system-managed identity in Azure. The identity resides in a different subscription than the databricks workspace: According to the "authentication" section of the databricks provider docume...

managed identity.png
Data Engineering
authentication
databricks provider
managed identity
Terraform
  • 20679 Views
  • 7 replies
  • 3 kudos
Latest Reply
goTEEMgo
New Contributor II
  • 3 kudos

Add an environment variable to you run environment. Add TF_LOG and set it to true. Scroll through and look for an oauth api call. Look at the resourceI have run into the same problem, and it looks our appreg for  AzureDatabricks enterprise applicatio...

  • 3 kudos
6 More Replies
aswithap
by Databricks Partner
  • 1454 Views
  • 1 replies
  • 0 kudos

Feasibility of Dynamically Reusing Common user defined functions Across Multiple DLT Notebooks

Hi @DataBricks team,I'm exploring ways to enable dynamic reusability of common user defined functions across multiple notebooks in a DLT (Delta Live Tables) pipeline. The goal is to avoid duplicating code and maintain a centralized location for commo...

  • 1454 Views
  • 1 replies
  • 0 kudos
Latest Reply
ashraf1395
Honored Contributor
  • 0 kudos

A simple solution and recommedned approach can be - If possible you can club all those common user defined functions in a structured python package / whl file.Now once this whl file is created you can simply upload it to your catalog volume and the f...

  • 0 kudos
vishaldevarajan
by New Contributor II
  • 4286 Views
  • 3 replies
  • 0 kudos

Unable to read excel files in the Azure databricks (UC enabled workspace)

Hello,After adding the maven library com.crealytics:spark-excel_2.12:0.13.5 under the artifact allowlist, I have installed it at the Azure databricks cluster level (shared, unity catalog enabled, runtime 15.4). Then I tried to create a df for the exc...

Data Engineering
Azure Databricks
Excel File
  • 4286 Views
  • 3 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

I did a little more digging and found further information:   Unity Catalog does not natively support reading Excel files directly. Based on the provided context, there are a few key points to consider: Third-Party Libraries: Reading Excel files in D...

  • 0 kudos
2 More Replies
walgt
by Databricks Partner
  • 4357 Views
  • 1 replies
  • 1 kudos

Resolved! Databricks data engineer associate exam

Hi everyone,I'm preparing for the Databricks Data Engineer Associate certification. On the Databricks website, they list the following self-paced courses available in Databricks Academy for exam preparation:Data Ingestion with Delta LakeDeploy Worklo...

  • 4357 Views
  • 1 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Greetings, Yes, you have identified the correct sequence of courses to take before attempting the exam. I would also recommend gaining at least six months of practical experience using Databricks for data engineering tasks prior to sitting for the ce...

  • 1 kudos
Sadam97
by New Contributor III
  • 1662 Views
  • 3 replies
  • 0 kudos

Predictive Optimization is not running

We have enabled predictive optimization at account level and metastore level. The enables check box can be seen in catalog details and table details. When i query the system.storage.predictive_optimization_operations_history table, it is still empty....

  • 1662 Views
  • 3 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

I can't help with your specifc workspace as I don't have access to any customer environment. Support can help if you open a ticket with them but at this point I am out suggestions.

  • 0 kudos
2 More Replies
ShivangiB
by New Contributor III
  • 2646 Views
  • 8 replies
  • 0 kudos

Not Able To Access GCP storage bucket from Databricks

While running :df = spark.read.format("csv") \    .option("header", "true") \    .option("inferSchema", "true") \    .load('path')df.show()Getting error : java.io.IOException: Invalid PKCS8 data.Cluster Spark Config : spark.hadoop.fs.gs.auth.service....

  • 2646 Views
  • 8 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

At this point it is out of my area of knowledge and I don't havey any further suggestions. You may want to consider contacting Databricks Support if you have a support contract.

  • 0 kudos
7 More Replies
Phani1
by Databricks MVP
  • 9009 Views
  • 2 replies
  • 0 kudos

Resolved! Databricks with Private cloud

Hi Databricks Team,Is it possible for Databricks to offer support for private cloud environments other than Azure, GCP, and AWS? The client intends to utilize Databricks in their own cloud for enhanced security. If this is feasible, what is the proce...

  • 9009 Views
  • 2 replies
  • 0 kudos
Latest Reply
mtatusDHS
New Contributor II
  • 0 kudos

We're looking at Databricks, but would prefer to use a Pure Storage Array to house data, mostly because of the cost of data storage for cloud providers. We're okay using cloud compute, but storage is much more feasible for us with local/private stora...

  • 0 kudos
1 More Replies
Labels