cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Manthansingh
by New Contributor
  • 2830 Views
  • 2 replies
  • 0 kudos

Writing part files in single text file

i want to write all my part file into a single text file is there anything i can do 

  • 2830 Views
  • 2 replies
  • 0 kudos
Latest Reply
Edthehead
Contributor III
  • 0 kudos

When writing a pyspark dataframe to a file, it will always write to a part file by default. This is because of partitions, even if there is only 1 partitions.To write into a single file you can convert the pyspark dataframe to a pandas dataframe and ...

  • 0 kudos
1 More Replies
herry
by New Contributor III
  • 6023 Views
  • 4 replies
  • 4 kudos

Resolved! Get the list of loaded files from Autoloader

Hello,We can use Autoloader to track the files that have been loaded from S3 bucket or not. My question about Autoloader: is there a way to read the Autoloader database to get the list of files that have been loaded?I can easily do this in AWS Glue j...

  • 6023 Views
  • 4 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

@Herry Ramli​ - Would you be happy to mark Hubert's answer as best so that other members can find the solution more easily?Thanks!

  • 4 kudos
3 More Replies
kumar_ravi
by New Contributor III
  • 827 Views
  • 0 replies
  • 0 kudos

Dlt pipleline with unity catalog and external tables

We were using dlt pipeline with our raw and enhanced layer ( on hive metastore) but recently upgraded to unity catalog. We have external tables(storing data on different s3 bucket and metadata for table at unity catalog).At the moment dlt doest suppo...

  • 827 Views
  • 0 replies
  • 0 kudos
marvin1
by New Contributor III
  • 13459 Views
  • 6 replies
  • 0 kudos

"Unable to upload to DBFS Query" Error running SQL warehouse query?

I have sql warehouse endpoints that work fine when querying from applications such as Tableau, but just running the included sample query against a running endpoint from the Query Editor from the workspace is returning "Unable to upload to DBFS Query...

  • 13459 Views
  • 6 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Marvin Ginns​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers ...

  • 0 kudos
5 More Replies
hantha
by New Contributor
  • 2391 Views
  • 1 replies
  • 1 kudos

For dummies: How to avoid 'bill shock' & control AWS charges while learning to use Databricks?

Hi, I'm an out-of-work data analyst wanting to re-skill as a 'citizen data engineer'.  By following how-to guides I was able to set up my own Databricks account etc along with a personal VPC in AWS. After 2 weeks of problem-free training I checked my...

Data Engineering
AWS
billing
nat gateway
Training
VPC
  • 2391 Views
  • 1 replies
  • 1 kudos
Latest Reply
holly
Databricks Employee
  • 1 kudos

Hi Hantha Databricks needs VPCs to work, but there are the default ones and customer managed ones: https://docs.databricks.com/en/security/network/classic/customer-managed-vpc.htmlCustomer managed ones are optional, but many tutorials include them as...

  • 1 kudos
Antoine_B
by Contributor
  • 2943 Views
  • 2 replies
  • 0 kudos

Resolved! Row Filter on Unity Catalog Tables based on Unity Catalog group appartenance

Hello,I would like to prevent users belonging to a given Unity Catalog group ('restricted_users_group') to access some rows of a Unity Catalog Table.For now, I was able to define a Row Filter function to prevent a list of users to access some rows, t...

  • 2943 Views
  • 2 replies
  • 0 kudos
Latest Reply
Antoine_B
Contributor
  • 0 kudos

Ok, so this problem needs no tricks. All was in the documentationI did not know about the function IS_ACCOUNT_GROUP_MEMBER(). So this Row Filter function did the job:CREATE FUNCTION rd.my_schema.my_row_filter(filter_column INTEGER) RETURNS BOOLEANRET...

  • 0 kudos
1 More Replies
Antoine_B
by Contributor
  • 1892 Views
  • 2 replies
  • 3 kudos

Resolved! Applying Row Filters to a table removes the ability to DEEP CLONE or SHALLOW CLONE this table

Hello,In this documentation I see some limitations coming with using Row Filters, like "Deep and shallow clones are not supported"We plan to use these Row Filters to hide sensitive data to some users.But not having CLONE available for tables with Row...

  • 1892 Views
  • 2 replies
  • 3 kudos
Latest Reply
NandiniN
Databricks Employee
  • 3 kudos

Hi Antoine_B The document explicitly calls out these limitations but there is no explicit mention of the roadmap for support. I would request you to reachout to the Account Executive and SAs so that they can suggest the alternative approach for now. ...

  • 3 kudos
1 More Replies
Neli
by New Contributor III
  • 3046 Views
  • 1 replies
  • 1 kudos

Resolved! Preferred way to read S3 - dbutils or Boto3 or better solution ?

We have a usecase where table has 15K rows , one of the column has S3 location. We need to read each row from table and fetch s3 location from one of the column,read  its content from s3. To read the content from S3 , workflow is taking lot of time, ...

  • 3046 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kannathasan
Databricks Partner
  • 1 kudos

Create IAM role in AWS S3 and use those credentials to connect to Databricks by using the below codeAWS_SECRET_ACCESS_KEY={{secrets/scope/aws_secret_access_key}}AWS_ACCESS_KEY_ID={{secrets/scope/aws_access_key_id}}aws_bucket_name = "my-s3-bucket"df =...

  • 1 kudos
Nisha_Aggarwal
by New Contributor II
  • 1718 Views
  • 2 replies
  • 2 kudos

How to remove specific data from bronze layer

Hello Team,At my end, we bring data from Kafka and through autoloader we ingest it in Bronze layer and further to silver and silver plus layer. Lately due to business changes, we need to delete specific data from the bronze and silver layer due to da...

  • 1718 Views
  • 2 replies
  • 2 kudos
Latest Reply
Nisha_Aggarwal
New Contributor II
  • 2 kudos

Hello,Thankyou for your reply on my query!My bronze layer has data in json format and currently, I need to remove 400 records from it. I also have job set up in streaming mode. Could you please suggest how I can go further with it?

  • 2 kudos
1 More Replies
Ram-Dev7
by New Contributor
  • 2034 Views
  • 2 replies
  • 0 kudos

Query on using secret scope for dbt-core integration with databricks workflow

Hello all,I am currently configuring dbt-core with Azure Databricks Workflow and using Azure Databricks M2M (Machine-to-Machine) authentication for this setup. I have the cluster ID and cluster secret ID stored in Databricks secret scope.I am seeking...

  • 2034 Views
  • 2 replies
  • 0 kudos
Latest Reply
RishabhTiwari07
Databricks Employee
  • 0 kudos

Hi @Ram-Dev7 , Thank you for reaching out to our community! We're here to help you. To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedbac...

  • 0 kudos
1 More Replies
mbaas
by New Contributor III
  • 2071 Views
  • 1 replies
  • 2 kudos

DLT Serverless costs

I recently started checking out the serverless delta live tables. In my understanding serverless continuous jobs (with autoloader) would only do something when new files arrive. However for serverless 4 pipelines running continuously, I spend in two ...

  • 2071 Views
  • 1 replies
  • 2 kudos
Latest Reply
NandiniN
Databricks Employee
  • 2 kudos

Hi @mbaas , That does not sound right. So, were you able to compare the jobs, stages and spot the difference. Are there more tasks added, or with compute perspective you do not find a difference at all but only the cost?  Also, it may be required to ...

  • 2 kudos
rumfox
by New Contributor II
  • 4089 Views
  • 2 replies
  • 2 kudos

Maximum Number of Parameters in Databricks SQL Queries

Hello Databricks Community,I'm working with Databricks SQL and encountered an issue when passing a large number of parameters in a query. Specifically, I attempted to pass 493 parameters, but I received the following error message:BAD_REQUEST : Too m...

  • 4089 Views
  • 2 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi @rumfox ,I would assume there is such a limit, the error is pretty clear. But the weird thing is, I cannot find any mention in documentation about it 

  • 2 kudos
1 More Replies
dadrake3
by New Contributor II
  • 1100 Views
  • 1 replies
  • 1 kudos

Delta Live Tables Unity Catalog Insufficient Permissions

I am receiving the following error when I try to run my DLT pipeline with unity catalog enabled.```raise Py4JJavaError( py4j.protocol.Py4JJavaError: An error occurred while calling o950.load. : org.apache.spark.SparkSecurityException: [INSUFFICIENT_P...

  • 1100 Views
  • 1 replies
  • 1 kudos
Latest Reply
dadrake3
New Contributor II
  • 1 kudos

I have also tried granting all permissions on the schema to myself and to all users and neither helped

  • 1 kudos
mdelvaux
by New Contributor
  • 741 Views
  • 0 replies
  • 0 kudos

BigQuery as foreign catalog - full object structs

Hi -We have mounted BigQuery, hosting Google Analytics data, as a foreign catalog.When querying the tables, objects are returned as strings, with all keys obfuscated by "f" or "v", likely to avoid replicating object keys across all records and hence ...

  • 741 Views
  • 0 replies
  • 0 kudos
Labels