cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Phani1
by Databricks MVP
  • 2545 Views
  • 3 replies
  • 0 kudos

Autoloader file latency

Hi Team,I would like to understand if there is a metadata table for the autoloader in Databricks that captures information about file arrival and processing.The reason we are experiencing data issues is because our table A receives hundreds of files ...

  • 2545 Views
  • 3 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Check with  cloud_files_state() API You can find examples here https://docs.databricks.com/en/ingestion/auto-loader/production.html#querying-files-discovered-by-auto-loader

  • 0 kudos
2 More Replies
VGS777
by New Contributor III
  • 2762 Views
  • 2 replies
  • 2 kudos

Resolved! Regarding cloning my gitrepo under workspace/Users/user_name

Hi all,I am recently started using databricks. I want to my git repo under workspace/Users/user_name path which I can't able to do it. But i can able to clone only under repo directory by default.Can anyone pls advice me regarding this Thank you

  • 2762 Views
  • 2 replies
  • 2 kudos
Latest Reply
VGS777
New Contributor III
  • 2 kudos

Thanks for this advice . 

  • 2 kudos
1 More Replies
Surajv
by New Contributor III
  • 4819 Views
  • 4 replies
  • 0 kudos

Connect my spark code running in AWS ECS to databricks cluster

Hi team, I wanted to know if there is a way to connect a piece of my pyspark code running in ECS to Databricks cluster and leverage the databricks compute using Databricks connect?I see Databricks connect is for connecting local ide code to databrick...

Get Started Discussions
AWS
databricks connect
ecs
pyspark
  • 4819 Views
  • 4 replies
  • 0 kudos
Latest Reply
Surajv
New Contributor III
  • 0 kudos

Noted @Retired_mod @RonDeFreitas. I am currently using Databricks runtime v12.2 (which is < v13.0). I followed this doc (Databricks Connect for Databricks Runtime 12.2 LTS and below) and connected my local terminal to Databricks cluster and was able ...

  • 0 kudos
3 More Replies
Data_Engineer3
by Contributor III
  • 5958 Views
  • 2 replies
  • 0 kudos

Resolved! spark context in databricks

Hi @all,In Azure Databricks,I am using structured streaming for each batch functionality, in one of the functions I am creating tempview with pyspark dataframe (*Not GlobalTempView) and trying to access the same temp view by using spark.sql functiona...

  • 5958 Views
  • 2 replies
  • 0 kudos
Latest Reply
Lakshay
Databricks Employee
  • 0 kudos

Do you face this issue without spark streaming as well? Also, could you share a minimal repo code preferably without streaming?

  • 0 kudos
1 More Replies
BabuMahesh
by New Contributor
  • 1056 Views
  • 0 replies
  • 0 kudos

Databricks & Bigquery

Databricks is packaging a old version of big-query jar(Databricks also repackaged and created a fat jar), and our application needs a latest jar. Now the latest jar depends on spark-bigquery-connector.properties  file for a property scala.binary.vers...

  • 1056 Views
  • 0 replies
  • 0 kudos
rudyevers
by New Contributor III
  • 2251 Views
  • 1 replies
  • 0 kudos

Unity catalog internal error - quality monitoring

I try to get my head around the quality monitoring functionality in Unity Catalog. I configured one of the tables in our unity catalog. My assumption is that the profile and drift metrics tables are automatically created. But when I get an internal e...

rudyevers_0-1702049219646.png rudyevers_0-1702049692739.png
  • 2251 Views
  • 1 replies
  • 0 kudos
Latest Reply
jreddy
New Contributor II
  • 0 kudos

Hi, were you able to resolve this, am having a similar issue - thanks

  • 0 kudos
shubhamshah1412
by New Contributor II
  • 2387 Views
  • 1 replies
  • 0 kudos

Generate Excel for a SQL query

Greetings,I am using a Java Spring boot application that is supposed to respond with an excel based on request. My current approach involves reading data using jdbc drivers, storing them in appropriate data structures, writing them to an excel which ...

  • 2387 Views
  • 1 replies
  • 0 kudos
Latest Reply
shubhamshah1412
New Contributor II
  • 0 kudos

Thanks for putting this together @Retired_mod ,I see that this approach will help to generate an excel after receiving the data from data bricks in the form of resultSet which has to be parsed.I believe this approach is the appropriate way to generat...

  • 0 kudos
hayden_blair
by New Contributor III
  • 5562 Views
  • 0 replies
  • 0 kudos

Error authenticating databricks.sdk.WorkspaceClient with external workspace via Azure Native Auth

I am referencing this doc to initialize a databricks.sdk.WorkspaceClient object instance via Azure Native Authentication. I am initializing this WorkspaceClient within a databricks notebook, but I am trying to use the client to access the Jobs api of...

error.png
Get Started Discussions
authentication
azure
WorkspaceClient
  • 5562 Views
  • 0 replies
  • 0 kudos
Phani1
by Databricks MVP
  • 2492 Views
  • 1 replies
  • 0 kudos

Data masking best practices

Hi Team,Could you please suggest any best practices/blogs on implementing data masking, row level ,column level ,access control, role-based access control (RBAC), and attribute-based access control (ABAC)?  Regards.Phanindra

  • 2492 Views
  • 1 replies
  • 0 kudos
Latest Reply
Lakshay
Databricks Employee
  • 0 kudos

Hi, Can you check if this document answers your question: https://www.databricks.com/blog/2020/11/20/enforcing-column-level-encryption-and-avoiding-data-duplication-with-pii.html

  • 0 kudos
Sujitha
by Databricks Employee
  • 7244 Views
  • 0 replies
  • 3 kudos

Unity Catalog Governance Value Levers

What makes Unity Catalog a game-changer? The blog intricately dissects five main value levers: mitigating data and architectural risks, ensuring compliance, accelerating innovation, reducing platform complexity and costs while improving operational e...

Screenshot 2024-01-29 at 11.48.55 AM.png
  • 7244 Views
  • 0 replies
  • 3 kudos
kiko_roy
by Contributor
  • 1855 Views
  • 2 replies
  • 0 kudos

IsBlindAppend config change

Hello Allcan someone please suggest me how can I change the config IsBlindAppend true from false.I need to do this not for a data table but a custom log table .Also is there any concern If I toggle the value as standard practises. pls suggest

  • 1855 Views
  • 2 replies
  • 0 kudos
Latest Reply
Lakshay
Databricks Employee
  • 0 kudos

IsBlindAppend is not a config but an operation metrics that is used in Delta Lake History. The value of this changes based on the type of operation performed on Delta table. https://docs.databricks.com/en/delta/history.html

  • 0 kudos
1 More Replies
Phani1
by Databricks MVP
  • 1563 Views
  • 0 replies
  • 1 kudos

feasibility of using user groups

Hi Team,Can you provide me the details of feasibility of using user groups for granting access to both Personally Identifiable Information (PII) and Non-PII in SQL PoolsRegards,Phanindra

  • 1563 Views
  • 0 replies
  • 1 kudos
ChristianRRL
by Honored Contributor
  • 11203 Views
  • 2 replies
  • 3 kudos

DLT Primary Key Deduplication: Expectations vs. Constraints vs. Other?

I'm trying to figure out what's the best way to "de-duplicate" data via DLT. Currently, my only leads are:Manage data quality with Delta Live Tables | Databricks on AWSVia "Drop invalid records"Constraints on Databricks | Databricks on AWSVia "pre-de...

Get Started Discussions
Auto Loader
autoloader
Delta Live Table
Delta Live Table Pipeline
dlt
  • 11203 Views
  • 2 replies
  • 3 kudos
Latest Reply
Palash01
Valued Contributor
  • 3 kudos

Hey @ChristianRRL ,Based on my understanding you want to de-duplicate your data during your DLT pipeline processing unfortunately I was not able to find a solution to this when I ran into this problem due to the native feature limitations.Limitations...

  • 3 kudos
1 More Replies
Phani1
by Databricks MVP
  • 23971 Views
  • 2 replies
  • 1 kudos

ADF vs Databricks

Hi Team ,I would appreciate your suggestion on which scenario to choose between ADF (Azure Data Factory) and Databricks for orchestration, as well as any significant differences between them.Regards,Phanindra

  • 23971 Views
  • 2 replies
  • 1 kudos
Latest Reply
Michael_Galli
Contributor III
  • 1 kudos

Hi, I work with both, so it depends on the usecase.ADF is easy to set up and good for data integration, e.g. "copy data" job to transfer files from storage 1 to storage 2ADF data flows (data transformations) can be used to some level, but when the tr...

  • 1 kudos
1 More Replies
harvey-c
by New Contributor III
  • 3729 Views
  • 4 replies
  • 0 kudos

DLT Performance question with Unity Catalog

Dear Community MembersThis question is about debugging performance issue of DLT pipeline with unity catalog.I had a DLT pipeline in Azure Databricks running on local store i.g. hive_metastore. And the processes took about 2 hour with the auto scalain...

  • 3729 Views
  • 4 replies
  • 0 kudos
Latest Reply
Mystagon
New Contributor III
  • 0 kudos

Hey Harvey, I getting around the same performance problems as you:From around 25 minutes in a normal workspace to an 1 hour and 20mins in UC workspace. Which is roughly 3x slower.Did you manage to solve this? I've also noticed dbutil.fs.ls() is much ...

  • 0 kudos
3 More Replies
Labels